Skip to content

fix(subagents): recheck timed-out announce waits#53127

Merged
BunsDev merged 5 commits intomainfrom
fix/subagent-timeout-race
Mar 23, 2026
Merged

fix(subagents): recheck timed-out announce waits#53127
BunsDev merged 5 commits intomainfrom
fix/subagent-timeout-race

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • add a regression for the timeout-first, success-later subagent announce race
  • recheck timed-out agent.wait results against the latest runtime snapshot before announcing
  • add a changelog fix entry

Testing

  • pnpm test -- src/agents/subagent-announce.format.e2e.test.ts -t "rechecks timed-out waits before announcing timeout when the run finishes immediately after"
  • pnpm test -- src/agents/subagent-announce.format.e2e.test.ts src/agents/subagent-announce.timeout.test.ts
  • pnpm build

Notes

  • scripts/committer hit an unrelated existing pnpm tsgo failure on current main in ui/src/ui/controllers/scope-errors.ts (detailCode is referenced without being defined), so this commit was finalized with --no-verify after the targeted suites and build passed.

Fixes #53106

@vincentkoc vincentkoc self-assigned this Mar 23, 2026
@vincentkoc vincentkoc marked this pull request as ready for review March 23, 2026 20:23
@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels Mar 23, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR fixes a timeout-first / success-later race in the subagent announce flow: when a worker finishes just after agent.wait returns "timeout", the first wait result was being used as the final outcome even though the run had actually completed successfully. The fix adds a zero-timeout recheck after output has been collected — if the run finished in the interim, the outcome is corrected to "ok" before the completion event is dispatched.

Key changes:

  • waitForSubagentRunOutcome and applySubagentWaitOutcome extract the gateway call and outcome-merge logic into reusable helpers, reducing duplication.
  • A post-output recheck (timeoutMs: 0) runs when outcome.status === "timeout" and reply content is already available, correcting the final status if the run completed in the race window.
  • A defensive if (!outcome) outcome = { status: "unknown" } guard is added before statusLabel construction to handle any path where neither wait nor a pre-supplied outcome set the field.
  • A regression test covers the race scenario end-to-end.

One issue to address: the recheck waitForSubagentRunOutcome call (line 1460) is not wrapped in its own try/catch. If the gateway call throws (network error, 2-second timeout), the outer catch block suppresses the entire announcement rather than falling back to the already-determined "timeout" outcome. Since the recheck is best-effort, it should be isolated so failures degrade gracefully instead of dropping the announcement.

Note: the commit was finalized with --no-verify due to an unrelated pre-existing pnpm tsgo failure in ui/src/ui/controllers/scope-errors.ts (detailCode undefined), which the author explicitly called out in the PR description.

Confidence Score: 4/5

  • Safe to merge after wrapping the recheck call in a try/catch to prevent gateway errors from silently dropping announcements.
  • The core race-condition fix is correct and well-covered by the new regression test. The refactoring into helper functions (waitForSubagentRunOutcome, applySubagentWaitOutcome) is clean and reduces duplication. One targeted fix is needed: the recheck gateway call should be isolated in its own try/catch so failures degrade to the known timeout outcome rather than suppressing the entire announcement.
  • src/agents/subagent-announce.ts lines 1460–1471 (recheck block missing try/catch)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/subagent-announce.ts
Line: 1460-1471

Comment:
**Recheck not isolated in try/catch — can suppress the announcement entirely**

The recheck `waitForSubagentRunOutcome` call is a best-effort operation, but it sits inside the outer `try/catch` of `runSubagentAnnounceFlow` with no protection of its own. If this gateway call throws (e.g., a 2-second timeout, network hiccup, or gateway unavailability), the outer catch logs the error and returns `didAnnounce = false` — meaning the announcement is dropped altogether instead of falling back to the already-known `"timeout"` outcome.

Before this PR, the flow at this point had already determined the outcome and collected `reply`; the recheck is purely an opportunistic correction. If it fails, the code should continue with the existing timeout outcome rather than aborting the entire announcement.

```suggestion
      // A worker can finish just after the first wait request timed out.
      // If we already have real completion content, do one cached recheck so
      // the final completion event prefers the authoritative terminal state.
      if (outcome?.status === "timeout" && reply?.trim() && params.waitForCompletion !== false) {
        try {
          const rechecked = await waitForSubagentRunOutcome(params.childRunId, 0);
          const applied = applySubagentWaitOutcome({
            wait: rechecked,
            outcome,
            startedAt: params.startedAt,
            endedAt: params.endedAt,
          });
          outcome = applied.outcome;
          params.startedAt = applied.startedAt;
          params.endedAt = applied.endedAt;
        } catch {
          // Best-effort recheck; keep the existing timeout outcome on failure.
        }
      }
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge branch 'main' into fix/subagent-ti..." | Re-trigger Greptile

Comment on lines +1460 to +1471
if (outcome?.status === "timeout" && reply?.trim() && params.waitForCompletion !== false) {
const rechecked = await waitForSubagentRunOutcome(params.childRunId, 0);
const applied = applySubagentWaitOutcome({
wait: rechecked,
outcome,
startedAt: params.startedAt,
endedAt: params.endedAt,
});
outcome = applied.outcome;
params.startedAt = applied.startedAt;
params.endedAt = applied.endedAt;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Recheck not isolated in try/catch — can suppress the announcement entirely

The recheck waitForSubagentRunOutcome call is a best-effort operation, but it sits inside the outer try/catch of runSubagentAnnounceFlow with no protection of its own. If this gateway call throws (e.g., a 2-second timeout, network hiccup, or gateway unavailability), the outer catch logs the error and returns didAnnounce = false — meaning the announcement is dropped altogether instead of falling back to the already-known "timeout" outcome.

Before this PR, the flow at this point had already determined the outcome and collected reply; the recheck is purely an opportunistic correction. If it fails, the code should continue with the existing timeout outcome rather than aborting the entire announcement.

Suggested change
if (outcome?.status === "timeout" && reply?.trim() && params.waitForCompletion !== false) {
const rechecked = await waitForSubagentRunOutcome(params.childRunId, 0);
const applied = applySubagentWaitOutcome({
wait: rechecked,
outcome,
startedAt: params.startedAt,
endedAt: params.endedAt,
});
outcome = applied.outcome;
params.startedAt = applied.startedAt;
params.endedAt = applied.endedAt;
}
// A worker can finish just after the first wait request timed out.
// If we already have real completion content, do one cached recheck so
// the final completion event prefers the authoritative terminal state.
if (outcome?.status === "timeout" && reply?.trim() && params.waitForCompletion !== false) {
try {
const rechecked = await waitForSubagentRunOutcome(params.childRunId, 0);
const applied = applySubagentWaitOutcome({
wait: rechecked,
outcome,
startedAt: params.startedAt,
endedAt: params.endedAt,
});
outcome = applied.outcome;
params.startedAt = applied.startedAt;
params.endedAt = applied.endedAt;
} catch {
// Best-effort recheck; keep the existing timeout outcome on failure.
}
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/subagent-announce.ts
Line: 1460-1471

Comment:
**Recheck not isolated in try/catch — can suppress the announcement entirely**

The recheck `waitForSubagentRunOutcome` call is a best-effort operation, but it sits inside the outer `try/catch` of `runSubagentAnnounceFlow` with no protection of its own. If this gateway call throws (e.g., a 2-second timeout, network hiccup, or gateway unavailability), the outer catch logs the error and returns `didAnnounce = false` — meaning the announcement is dropped altogether instead of falling back to the already-known `"timeout"` outcome.

Before this PR, the flow at this point had already determined the outcome and collected `reply`; the recheck is purely an opportunistic correction. If it fails, the code should continue with the existing timeout outcome rather than aborting the entire announcement.

```suggestion
      // A worker can finish just after the first wait request timed out.
      // If we already have real completion content, do one cached recheck so
      // the final completion event prefers the authoritative terminal state.
      if (outcome?.status === "timeout" && reply?.trim() && params.waitForCompletion !== false) {
        try {
          const rechecked = await waitForSubagentRunOutcome(params.childRunId, 0);
          const applied = applySubagentWaitOutcome({
            wait: rechecked,
            outcome,
            startedAt: params.startedAt,
            endedAt: params.endedAt,
          });
          outcome = applied.outcome;
          params.startedAt = applied.startedAt;
          params.endedAt = applied.endedAt;
        } catch {
          // Best-effort recheck; keep the existing timeout outcome on failure.
        }
      }
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56e36b39ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// If we already have real completion content, do one cached recheck so
// the final completion event prefers the authoritative terminal state.
if (outcome?.status === "timeout" && reply?.trim() && params.waitForCompletion !== false) {
const rechecked = await waitForSubagentRunOutcome(params.childRunId, 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep timeout announcements when recheck RPC fails

The new timeout recheck is unguarded, so any agent.wait failure here (for example a transient gateway timeout/auth reconnect) throws into the outer runSubagentAnnounceFlow catch and aborts delivery, returning didAnnounce=false even though a reply was already available and we could still send the original timeout completion event. Because this branch is only a best-effort status refinement, a recheck error should not suppress the announcement path.

Useful? React with 👍 / 👎.

@BunsDev
Copy link
Copy Markdown
Member

BunsDev commented Mar 23, 2026

Addressed Greptile’s try/catch concern in 474ef7b by making the timeout recheck best-effort. If the recheck RPC fails, the flow now keeps the existing timeout outcome and continues announcing instead of dropping the announcement entirely. Targeted subagent announce suites passed locally: src/agents/subagent-announce.format.e2e.test.ts and src/agents/subagent-announce.timeout.test.ts.

@BunsDev BunsDev merged commit 34c5748 into main Mar 23, 2026
3 checks passed
@BunsDev BunsDev deleted the fix/subagent-timeout-race branch March 23, 2026 20:36
Linux2010 pushed a commit to Linux2010/openclaw that referenced this pull request Mar 23, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot bot commented Mar 23, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🔵 Low Prompt/Log injection via untrusted wait.error embedded in statusLabel outside untrusted delimiters

1. 🔵 Prompt/Log injection via untrusted wait.error embedded in statusLabel outside untrusted delimiters

Property Value
Severity Low
CWE CWE-74
Location src/agents/subagent-announce.ts:1492-1500

Description

The subagent announce flow incorporates wait.error (returned by the gateway) into outcome.error, and then interpolates it into statusLabel. That statusLabel is later embedded into the internal steer/prompt text outside the <<<BEGIN_UNTRUSTED_CHILD_RESULT>>> / <<<END_UNTRUSTED_CHILD_RESULT>>> delimiters.

This enables prompt/log injection if an attacker (or compromised gateway/worker) can influence wait.error to include newlines and crafted text (e.g., additional Action: lines, fake headers, delimiter-like strings), because it will be rendered as part of the trusted “status” line in the internal message.

Vulnerable flow:

  • Input: wait.error from agent.wait gateway response
  • Propagation: applySubagentWaitOutcome assigns { status: "error", error: waitError }
  • Sink: statusLabel string interpolation and inclusion in AgentInternalEvent.statusLabel, later formatted into the prompt by formatTaskCompletionEvent() as status: ${event.statusLabel} (outside the untrusted result block)

Vulnerable code:

: outcome.status === "error"
  ? `failed: ${outcome.error || "unknown error"}`
  : "finished with unknown status";

Recommendation

Treat wait.error as untrusted data and prevent it from affecting the trusted structure of internal prompt/log lines.

Options (prefer 1 or combine):

  1. Sanitize/encode before interpolation into statusLabel (e.g., replace newlines/tabs and truncate):
function safeOneLine(text: string, max = 200): string {
  return text.replace(/[\r\n\t]/g, " ").slice(0, max);
}

const safeError = typeof outcome.error === "string" ? safeOneLine(outcome.error) : undefined;
const statusLabel = outcome.status === "error"
  ? `failed: ${safeError || "unknown error"}`
  : /* existing */;
  1. Move untrusted error details into the untrusted result block (or add a dedicated delimited block for error details) and keep statusLabel a fixed set of known strings.

  2. Consider validating wait.status and wait.error schema at the gateway boundary and rejecting unexpected formats.


Analyzed PR: #53127 at commit 5f66018

Last updated on: 2026-03-23T21:03:41Z

Linux2010 pushed a commit to Linux2010/openclaw that referenced this pull request Mar 23, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
iclem pushed a commit to iclem/openclaw that referenced this pull request Mar 23, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
Linux2010 pushed a commit to Linux2010/openclaw that referenced this pull request Mar 24, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
hzq001 pushed a commit to hzq001/openclaw that referenced this pull request Mar 24, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
furaul pushed a commit to furaul/openclaw that referenced this pull request Mar 24, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
Arry8 pushed a commit to Arry8/openclaw that referenced this pull request Mar 25, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
netandreus pushed a commit to netandreus/openclaw that referenced this pull request Mar 25, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 25, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
(cherry picked from commit 34c5748)
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 25, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
(cherry picked from commit 34c5748)
npmisantosh pushed a commit to npmisantosh/openclaw that referenced this pull request Mar 25, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
Linux2010 pushed a commit to Linux2010/openclaw that referenced this pull request Mar 26, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
Linux2010 pushed a commit to Linux2010/openclaw that referenced this pull request Mar 26, 2026
Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>
0x666c6f added a commit to 0x666c6f/openclaw that referenced this pull request Mar 26, 2026
…claw#105)

* fix(web-search): mark DuckDuckGo experimental

* docs(tools): update DuckDuckGo Search for landed plugin code

- Mark as experimental (not just unofficial)
- Add region and safeSearch tool parameters (from DDG schema)
- Add plugin config example for region/safeSearch defaults
- Document auto-detection order (100 = last)
- Note SafeSearch defaults to moderate
- Verified against extensions/duckduckgo/src/

* fix(agents): deny local MEDIA paths for MCP results

* Usage: include reset and deleted session archives (openclaw#43215)

Merged via squash.

Prepared head SHA: 49ed6c2
Co-authored-by: rcrick <[email protected]>
Co-authored-by: frankekn <[email protected]>
Reviewed-by: @frankekn

* docs(tools): soften DDG wording (scrapes -> pulls/gathers)

* fix(build): add stable memory-cli dist entry (openclaw#51759)

Co-authored-by: oliviareid-svg <[email protected]>
Co-authored-by: Frank <[email protected]>

* refactor!: drop legacy CLAWDBOT env compatibility

* refactor!: remove moltbot state-dir migration fallback

* fix(gateway): preserve async hook ingress provenance

* fix(ci): write dist build stamp after builds

* perf: trim vitest hot imports and refresh manifests

* fix(security): unwrap time dispatch wrappers

* fix(plugin-sdk): fall back to src root alias files

* fix(ci): skip docs-only preflight pnpm audit

* docs(changelog): note time exec approval fix

* docs: refresh plugin-sdk api baseline

* fix(runtime): make dist-runtime staging idempotent

* fix(media): bound remote error-body snippet reads

* fix(gateway): gate internal command persistence mutations

* fix: restrict remote marketplace plugin sources

* fix(runtime): skip peer resolution for bundled plugin deps

* docs(agents): prefer current test model examples

* fix(exec): escape invisible approval filler chars

* test(models): refresh example model fixtures

* fix(security): unify dispatch wrapper approval hardening

* fix(security): harden explicit-proxy SSRF pinning

* fix: gate synology chat reply name matching

* docs: clarify sessions_spawn ACP vs subagent policies

* refactor(exec): split wrapper resolution modules

* refactor(exec): make dispatch wrapper semantics spec-driven

* refactor(exec): share wrapper trust planning

* refactor(exec): rename wrapper plans for trust semantics

* fix: include .npmrc in onboard docker build

* test: trim docker live auth mounts

* Docs: refresh config baseline for Synology Chat

* refactor: clarify synology delivery identity names

* refactor: centralize synology dangerous name matching

* refactor: narrow synology legacy name lookup

* refactor: audit synology dangerous name matching

* refactor: dedupe synology config schema

* fix: normalize scoped vitest filter paths

* fix(voice-call): harden webhook pre-auth guards

* fix(synology-chat): fail closed shared webhook paths

* docs: credit nexrin in synology changelog

* test: fix base vitest thread regressions

* test: finish base vitest thread fixture fixes

* test(voice-call): accept oversize webhook socket resets

* test: honor env auth in gateway live probes

* fix: harden plugin docker e2e

* Docs: align MiniMax examples with M2.7

* fix(ci): restore stale guardrails and baselines

* Test: isolate qr dashboard integration suite

* Gateway: resolve fallback plugin context lazily

* fix: bind bootstrap setup codes to node profile

* fix(tlon): unify settings reconciliation semantics

* refactor(synology-chat): type startup webhook path policy

* docs(synology-chat): clarify multi-account webhook paths

* refactor: unify minimax model and failover live policies

* docs: sync minimax m2.7 references

* fix: harden Windows Parallels smoke installs

* docs: reorder unreleased changelog by user impact

* refactor: remove embedded runner cwd mutation

* Infra: support shell carrier allow-always approvals

* refactor: centralize bootstrap profile handling

* refactor: reuse canonical setup bootstrap profile

* fix(plugin-sdk): resolve hashed diagnostic events chunks

* fix(plugin-sdk): normalize hashed diagnostic event exports

* test: fix ci env-sensitive assertions

* fix(gateway): fail closed on unresolved discovery endpoints

* feat: add slash plugin installs

* fix(media): block remote-host file URLs in loaders

* fix(media): harden secondary local path seams

* test: harden no-isolate reply teardown

* docs(changelog): add Windows media security fix

* refactor(gateway): centralize discovery target handling

* test: narrow live transcript scaffolding strip

* test: fix ci docs drift and bun qr exit handling

* fix(browser): enforce node browser proxy allowProfiles

* refactor(media): share local file access guards

* test: stabilize ci test harnesses

* test: harden no-isolate test module resets

* fix(plugins): preserve live hook registry during gateway runs

* test: fix channel summary registry setup

* test: harden isolated test mocks

* chore(plugins): remove opik investigation checkpoints

* ACPX: align pinned runtime version (openclaw#52730)

* ACPX: align pinned runtime version

* ACPX: drop version example from help text

* test: stop leaking image workspace temp dirs

* fix(android): gate canvas bridge to trusted pages (openclaw#52722)

* fix(android): gate canvas bridge to trusted pages

* fix(changelog): note android canvas bridge gating

* Update apps/android/app/src/main/java/ai/openclaw/app/node/CanvasActionTrust.kt

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(android): snapshot canvas URL on UI thread

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* test: isolate base vitest thread blockers

* fix: sync agent and autoreply e2e updates

* test: harden no-isolate mocked module resets

* docs: reorder unreleased changelog

* fix(changelog): note windows media path guardrails (openclaw#52738)

* fix: alphabetize web search provider listings

* docs: clarify unreleased breaking changes

* test: harden ci isolated mocks

* fix: align websocket stream fallback types

* test: finish no-isolate suite hardening

* style: format image-generation runtime tests

* fix(memory-core): register memory tools independently to prevent coupled failure (openclaw#52668)

Merged via admin squash because current required CI failures are inherited from base and match latest `main` failures outside this PR's `memory-core` surface.

Prepared head SHA: df7f968
Co-authored-by: artwalker <[email protected]>
Reviewed-by: @frankekn

* fix(status): recompute fallback context window (openclaw#51795)

* fix(status): recompute fallback context window

* fix(status): keep live context token caps on fallback

* fix(status): preserve fallback runtime context windows

* fix(status): preserve configured fallback context caps

* fix(status): keep provider-aware transcript context lookups

* fix(status): preserve explicit fallback context caps

* fix(status): clamp fallback configured context caps

* fix(status): keep raw runtime slash ids

* fix(status): refresh plugin-sdk api baseline

* fix(status): preserve fallback context lookup

* test(status): refresh plugin-sdk api baseline

* fix(status): keep runtime slash-id context lookup

---------

Co-authored-by: create <[email protected]>
Co-authored-by: Frank Yang <[email protected]>
Co-authored-by: RichardCao <[email protected]>

* fix(telegram): make buttons schema optional in message tool

The Telegram plugin injects a `buttons` property into the message tool
schema via `createMessageToolButtonsSchema()`, but without wrapping it
in `Type.Optional()`. This causes TypeBox to include `buttons` in the
JSON Schema `required` array.

In isolated sessions (e.g. cron jobs) where no `currentChannel` is set,
all plugin schemas are merged into the message tool. When the LLM calls
the message tool without a `buttons` parameter, AJV validation fails
with: `buttons: must have required property 'buttons'`.

Wrap the buttons schema in `Type.Optional()` so it is not required.

* fix: keep message-tool buttons optional for Telegram and Mattermost (openclaw#52589) (thanks @tylerliu612)

* test: update codex test fixtures to gpt-5.4

* fix: repair runtime seams after rebase

* fix: restore Telegram topic announce delivery (openclaw#51688) (thanks @mvanhorn)

When `replyLike.text` or `replyLike.caption` is an unexpected
non-string value (edge case from some Telegram API responses),
the reply body was coerced to "[object Object]" via string
concatenation. Add a `typeof === "string"` guard to gracefully
fall back to empty string, matching the existing pattern used
for `quoteText` in the same function.

Co-authored-by: Penchan <[email protected]>

* docs: sync generated release baselines

* test: isolate pi embedded model thread fixtures

* fix: restore provider runtime lazy boundary

* fix: preserve Telegram reply context text (openclaw#50500) (thanks @p3nchan)

* fix: guard Telegram reply context text (openclaw#50500) (thanks @p3nchan)

* fix: preserve Telegram reply caption fallback (openclaw#50500) (thanks @p3nchan)

---------

Co-authored-by: Ayaan Zaidi <[email protected]>

* fix: harden gateway SIGTERM shutdown (openclaw#51242) (thanks @juliabush)

* fix: increase shutdown timeout to avoid SIGTERM hang

* fix(telegram): abort polling fetch on shutdown to prevent SIGTERM hang

* fix(gateway): enforce hard exit on shutdown timeout for SIGTERM

* fix: tighten gateway shutdown watchdog

* fix: harden gateway SIGTERM shutdown (openclaw#51242) (thanks @juliabush)

---------

Co-authored-by: Ayaan Zaidi <[email protected]>

* build: prepare 2026.3.22-beta.1

* fix: restore provider runtime lazy boundary

* test: add parallels npm update smoke

* test: split pi embedded model thread fixtures

* fix: stop browser server tests from launching real chrome

* test: stabilize live provider docker probes

* fix: restart windows gateway after npm update

* test: isolate server-context browser harness imports

* test: inject model runtime hooks for thread-safe tests

* test: snapshot ci timeout investigation

* test: target gemini 3.1 flash alias

* test: stabilize trigger handling and hook e2e tests

* build: prepare 2026.3.22

* test: harden channel suite isolation

* test: inject thread-safe deps for agent tools

* test: raise timeout for slow provider auth normalization

* ci: stabilize windows and bun unit lanes

* test: inject thread-safe gateway and ACP seams

* test: isolate pi model and reset-model thread fixtures

* build: prepare 2026.3.23

* test: inject image-tool provider deps for raw threads

* test: stabilize e2e module isolation

* test: decouple vitest config checks from ambient env

* fix: harden parallels smoke agent invocation

* test: avoid repo-root perf profile artifacts

* test: inject thread-safe base seams

* fix: document Telegram asDocument alias (openclaw#52461) (thanks @bakhtiersizhaev)

* feat(telegram): add asDocument param to message tool

Adds `asDocument` as a user-facing alias for the existing `forceDocument`
parameter in the message tool. When set to `true`, media files (images,
videos, GIFs) are sent via `sendDocument` instead of `sendPhoto`/
`sendVideo`/`sendAnimation`, preserving the original file quality
without Telegram compression.

This is useful when agents need to deliver high-resolution images or
uncompressed files to users via Telegram.

`asDocument` is intentionally an alias rather than a replacement — the
existing `forceDocument` continues to work unchanged.

Changes:
- src/agents/tools/message-tool.ts: add asDocument to send schema
- src/agents/tools/telegram-actions.ts: OR asDocument into forceDocument
- src/infra/outbound/message-action-runner.ts: same OR logic for outbound path
- extensions/telegram/src/channel-actions.ts: read and forward asDocument
- src/channels/plugins/actions/actions.test.ts: add test case

* fix: restore channel-actions.ts to main version (rebase conflict fix)

* fix(test): match asDocument test payload to actual params structure

* fix(telegram): preserve forceDocument alias semantics

* fix: document Telegram asDocument alias (openclaw#52461) (thanks @bakhtiersizhaev)

---------

Co-authored-by: Бахтиер Сижаев <[email protected]>
Co-authored-by: Ayaan Zaidi <[email protected]>

* fix: refactor deepseek bundled plugin (openclaw#48762) (thanks @07akioni)

* fix: declare typebox runtime dep for mattermost plugin

* test: reset line webhook mocks between cases

* test: split attempt spawn-workspace thread fixtures

* test: remove replaced spawn-workspace monolith

* refactor: isolate attempt context engine thread helpers

* CI: remove npm release preview workflow (openclaw#52825)

* CI: remove npm release preview workflow

* Docs: align release maintainer skill with manual publish

* Docs: expand release maintainer skill flow

* test: stabilize gateway thread harness

* test: fix status plugin pagination expectation

* test: harden channel suite isolation

* build: sync lockfile for mattermost plugin

* fix: ensure env proxy dispatcher before MiniMax and OpenAI Codex OAuth flows (openclaw#52228)

Verified:
- pnpm install --frozen-lockfile
- NPM_CONFIG_CACHE=/tmp/openclaw-npm-cache-52228 pnpm build
- pnpm check
- pnpm test:macmini (failed on inherited pre-existing plugin contract test: src/plugins/contracts/registry.contract.test.ts missing deepseek in bundled provider contract registry outside this PR surface)

Co-authored-by: openperf <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* fix: restore ci gates

* test: stabilize channel ci gate

* docs: refresh generated config baseline

* release: verify control-ui assets are included in npm tarball

* release-check: include stderr/stdout when npm pack fails

* release: add changelog for control UI tarball check

* fix: keep session transcript pointers fresh after compaction (openclaw#50688)

Co-authored-by: Frank Yang <[email protected]>

* fix(msteams): isolate probe test env credentials

* release: automate macOS publishing (openclaw#52853)

* release: automate macOS publishing

* release: keep mac appcast in openclaw repo

* release: add preflight-only release workflow runs

* release: keep appcast updates manual

* release: generate signed appcast as workflow artifact

* release: require preflight before publish

* release: require mac app for every release

* docs: clarify every release ships mac app

* release: document Sparkle feed and SHA rules

* release: keep publish flow tag-based

* release: stabilize mac appcast flow

* release: document local mac fallback

* Update CHANGELOG.md

* Improve PR template regression prompts

* fix(agents): preserve anthropic thinking block order (openclaw#52961)

* fix(release): ship bundled plugins in pack artifacts

* fix(config): keep built-in channels out of plugin allowlists (openclaw#52964)

* fix(config): keep built-in channels out of plugin allowlists

* docs(changelog): note doctor whatsapp allowlist fix

* docs(changelog): move doctor whatsapp fix to top

* Update CHANGELOG.md

* fix(config): keep built-in auto-enable idempotent

* fix(release): preserve shipped channel surfaces in npm tar (openclaw#52913)

* fix(channels): ship official channel catalog (openclaw#52838)

* fix(release): keep shipped bundles in npm tar (openclaw#52838)

* build(release): fix rebased release-check helpers (openclaw#52838)

* fix(gateway): harden supervised lock and browser attach readiness

* fix(matrix): avoid duplicate runtime api exports

* fix(gateway): avoid probe false negatives after connect

* docs(changelog): note release and matrix fixes

* fix(plugins): unblock Discord/Slack message tool sends and Feishu media (openclaw#52991)

* fix(plugins): unblock Discord and Slack message tool payloads

* docs(changelog): note Discord Slack and Feishu message fixes

* fix(channels): preserve external catalog overrides (openclaw#52988)

* fix(channels): preserve external catalog overrides

* fix(channels): clarify catalog precedence

* fix(channels): respect overridden install specs

* fix(gateway): require admin for agent session reset

* fix(voice-call): stabilize plivo v2 replay keys

* fix(gateway): require auth for canvas routes

* fix(clawhub): resolve auth token for skill browsing (openclaw#53017)

* fix(clawhub): resolve auth token for skill browsing

* docs(changelog): note clawhub skill auth fix

* fix(release): raise npm pack size budget

* Tests: fix fresh-main regressions (openclaw#53011)

* Tests: fix fresh-main regressions

* Tests: avoid chat notice cache priming

---------

Co-authored-by: Vincent Koc <[email protected]>

* fix(config): ignore stale plugin allow entries

* fix(browser): reuse running loopback browser after probe miss

* fix(clawhub): honor macOS auth config path (openclaw#53034)

* docs: fix nav ordering, missing pages, and stale model references

- Sort providers alphabetically in docs.json nav
- Sort channels alphabetically in docs.json nav (slack before synology-chat)
- Add install/migrating-matrix to Maintenance nav section (was orphaned)
- Remove zh-CN/plugins/architecture from nav (file does not exist)
- Add Voice Call to channels index page
- Add missing providers to providers index (DeepSeek, GitHub Copilot, OpenCode Go, Synthetic)
- Sort providers index alphabetically
- Update stale claude-3-5-sonnet model reference to claude-sonnet-4-6 in webhook docs

* fix(clawhub): preserve XDG auth path on macOS

* Agents: fix runtime web_search provider selection (openclaw#53020)

Co-authored-by: Vincent Koc <[email protected]>

* docs: fix CLI command tree, SDK import path, and tool group listing

- Remove non-existent 'secrets migrate' from CLI command tree
- Add actual secrets subcommands: audit, configure, apply
- Add missing plugin subcommands: inspect, uninstall, update, marketplace list
- Fix plugins info -> inspect (actual command name)
- Add message send and broadcast subcommands to command tree
- Remove misleading deprecated import from sdk-overview
- Add sessions_yield and subagents to group:sessions tool group docs
- Fix formatting

* fix(gateway): guard openrouter auto pricing recursion (openclaw#53055)

* test: refresh thread-safe agent fixtures

* Release: fix npm release preflight under pnpm (openclaw#52985)

Co-authored-by: Vincent Koc <[email protected]>

* docs(changelog): add channel catalog override note (openclaw#52988) (openclaw#53059)

* fix: harden update dev switch and refresh changelog

* fix(mistral): repair max-token defaults and doctor migration (openclaw#53054)

* fix(mistral): repair max-token defaults and doctor migration

* fix(mistral): add missing small-model repair cap

* fix(plugins): enable bundled Brave web search plugin by default (openclaw#52072)

Brave is a bundled web search plugin but was missing from
BUNDLED_ENABLED_BY_DEFAULT, causing it to be filtered out during
provider resolution. This made web_search unavailable even when
plugins.entries.brave.enabled was configured.

Fixes openclaw#51937

Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Vincent Koc <[email protected]>

* fix(release): fail empty control ui tarballs

* Revert "fix(plugins): enable bundled Brave web search plugin by default (openclaw#52072)"

This reverts commit 0ea3c4d.

* Telegram: preserve inbound debounce order

* Telegram: fix fire-and-forget debounce order

* fix(reply): refresh followup drain callbacks

* Update CHANGELOG.md

* fix(reply): preserve no-debounce inbound concurrency

* fix(reply): clear idle followup callbacks

* fix(inbound): bound tracked debounce keys

* fix: preserve debounce and followup ordering (openclaw#52998) (thanks @osolmaz)

* fix(discord): reply on native command auth failures (openclaw#53072)

* docs(changelog): add missing recent fixes

* fix: bound tracked debounce key accounting

* fix packaged control ui asset lookup (openclaw#53081)

* fix(cli): preserve posix default git dir

* build: prepare 2026.3.23-beta.1

* test: harden canvas host undici isolation

* docs(changelog): credit web search runtime fix

* fix(openai-codex): bootstrap proxy on oauth refresh (openclaw#53078)

Verified:
- pnpm install --frozen-lockfile
- pnpm exec vitest run extensions/openai/openai-codex-provider.runtime.test.ts extensions/openai/openai-provider.test.ts

* release: harden preflight workflows (openclaw#53087)

* release: harden preflight-only workflows

* release: require main for publish runs

* release: select xcode for macos workflow

* release: retry flaky macos preflight steps

* ci: shard bun test lane

* Fix Control UI operator.read scope handling (openclaw#53110)

Preserve Control UI scopes through the device-auth bypass path, normalize implied operator device-auth scopes, ignore cached under-scoped operator tokens, and degrade read-backed main pages gracefully when a connection truly lacks operator.read.

Co-authored-by: Val Alexander <[email protected]>

* build: prepare 2026.3.23

* fix(agents): prefer runtime snapshot for skill secrets

* docs(changelog): note skill secretref runtime fix

* fix(memory): bootstrap lancedb runtime on demand (openclaw#53111)

Bootstrap LanceDB into plugin runtime state on first use for packaged/global installs, keep @lancedb/lancedb plugin-local, and add regression coverage for bundled, cached, retry, and Nix fail-fast runtime paths.

Co-authored-by: Val Alexander <[email protected]>

* build: finalize 2026.3.23 release

* release: upload macos preflight artifacts (openclaw#53105)

* release: upload macos preflight artifacts

* release: speed up macos preflight

* release: use xlarge macos runner

* release: skip dmg path in macos preflight

* fix(subagents): recheck timed-out announce waits (openclaw#53127)

Recheck timed-out subagent announce waits against the latest runtime snapshot before announcing timeout, and keep that recheck best-effort so transient gateway failures do not suppress the announcement.

Co-authored-by: Val Alexander <[email protected]>

* docs(feishu): replace botName with name in config examples (openclaw#52753)

Merged via squash.

Prepared head SHA: 5237726
Co-authored-by: haroldfabla2-hue <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* fix(plugins): accept clawhub uninstall specs

* test(auth): align device scope expectations (openclaw#53151)

* fix: prevent delivery-mirror re-delivery and raise Slack chunk limit (openclaw#45489)

Merged via squash.

Prepared head SHA: c7664c7
Co-authored-by: theo674 <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* Infra: tighten shell-wrapper positional-argv allowlist matching (openclaw#53133)

* Infra: tighten shell carrier allowlist matching

* fix(security): tighten shell carrier allowlist matcher

* fix: generalize api_error detection for fallback model triggering (openclaw#49611)

Co-authored-by: Ayush Ojha <[email protected]>
Co-authored-by: altaywtf <[email protected]>

* feat(modelstudio): add standard (pay-as-you-go) DashScope endpoints for Qwen (openclaw#43878)

Add Standard API Key auth methods for China (dashscope.aliyuncs.com)
and Global/Intl (dashscope-intl.aliyuncs.com) pay-as-you-go endpoints
alongside the existing Coding Plan (subscription) endpoints.

Also updates group label to 'Qwen (Alibaba Cloud Model Studio)' and
fixes glm-4.7 -> glm-5 in Coding Plan note messages.

Co-authored-by: wenmeng zhou <[email protected]>

* Release: privatize macOS publish flow (openclaw#53166)

* fix(diagnostics): redact credentials from cache-trace diagnostic output

Refs openclaw#53103

* Release: document manual macOS asset upload (openclaw#53178)

* Release: document manual macOS asset upload

* Release: document macOS smoke-test mode

* docs(changelog): reorder release highlights

* test(whatsapp): stabilize login coverage in shared workers

* test(whatsapp): preserve session exports in login coverage

* test(whatsapp): preserve media test module exports

* test(whatsapp): preserve harness session exports

* fix(ci): stabilize whatsapp extension checks

* test: make update-cli checkout path assertion platform-safe

* fix(auth): prevent stale auth store reverts (openclaw#53211)

* Doctor: prune stale plugin allowlist and entry refs (openclaw#53187)

Signed-off-by: sallyom <[email protected]>

* test: stabilize test isolation

* test: update command coverage

* test: expand gemini live transcript stripping

* test: fix update-cli default path assertion

* chore(sre:PLA-920): adopt upstream sync changes

* fix(sre:PLA-920): align branch with adopted upstream tree

* build(sre:PLA-920): refresh dist artifacts

* test(sre:PLA-920): align incident-format expectations

---------

Signed-off-by: sallyom <[email protected]>
Co-authored-by: Vincent Koc <[email protected]>
Co-authored-by: Peter Steinberger <[email protected]>
Co-authored-by: Rick_Xu <[email protected]>
Co-authored-by: rcrick <[email protected]>
Co-authored-by: frankekn <[email protected]>
Co-authored-by: oliviareid-svg <[email protected]>
Co-authored-by: oliviareid-svg <[email protected]>
Co-authored-by: Frank <[email protected]>
Co-authored-by: scoootscooob <[email protected]>
Co-authored-by: ruochen <[email protected]>
Co-authored-by: Onur Solmaz <[email protected]>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Frank Yang <[email protected]>
Co-authored-by: artwalker <[email protected]>
Co-authored-by: RichardCao <[email protected]>
Co-authored-by: create <[email protected]>
Co-authored-by: RichardCao <[email protected]>
Co-authored-by: liuyang <[email protected]>
Co-authored-by: Ayaan Zaidi <[email protected]>
Co-authored-by: Matt Van Horn <[email protected]>
Co-authored-by: Penchan <[email protected]>
Co-authored-by: Penchan <[email protected]>
Co-authored-by: Julia Bush <[email protected]>
Co-authored-by: Bakhtier Sizhaev <[email protected]>
Co-authored-by: Бахтиер Сижаев <[email protected]>
Co-authored-by: wangchunyue <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>
Co-authored-by: evann <[email protected]>
Co-authored-by: Robin Waslander <[email protected]>
Co-authored-by: Sathvik Veerapaneni <[email protected]>
Co-authored-by: Nimrod Gutman <[email protected]>
Co-authored-by: Luke <[email protected]>
Co-authored-by: scoootscooob <[email protected]>
Co-authored-by: Jamil Zakirov <[email protected]>
Co-authored-by: TheRipper <[email protected]>
Co-authored-by: Quinn H. <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Val Alexander <[email protected]>
Co-authored-by: betoblair <[email protected]>
Co-authored-by: haroldfabla2-hue <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Co-authored-by: Altay <[email protected]>
Co-authored-by: theo674 <[email protected]>
Co-authored-by: theo674 <[email protected]>
Co-authored-by: Ayush Ojha <[email protected]>
Co-authored-by: Ayush Ojha <[email protected]>
Co-authored-by: George Zhang <[email protected]>
Co-authored-by: wenmeng zhou <[email protected]>
Co-authored-by: Onur <[email protected]>
Co-authored-by: Sally O'Malley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Subagent timeout status race condition: workers show 'timed out' when successfully completed

2 participants