Skip to content

fix(acp): map error states to end_turn instead of unconditional refusal#41187

Merged
onutc merged 2 commits intoopenclaw:mainfrom
pejmanjohn:fix/acp-stop-reason-mapping
Mar 9, 2026
Merged

fix(acp): map error states to end_turn instead of unconditional refusal#41187
onutc merged 2 commits intoopenclaw:mainfrom
pejmanjohn:fix/acp-stop-reason-mapping

Conversation

@pejmanjohn
Copy link
Copy Markdown
Contributor

@pejmanjohn pejmanjohn commented Mar 9, 2026

Summary

  • Problem: handleChatEvent maps all gateway "error" states to ACP stop reason "refusal". This is semantically wrong — "refusal" means the agent deliberately refused (e.g., safety filter), but most errors are transient failures (timeouts, rate limits, API crashes)
  • Why it matters: Users see "agent refused to respond" in their IDE when the actual problem was a timeout or API error — confusing UX that makes users second-guess their prompts instead of retrying
  • What changed: Error states now resolve as "end_turn" instead of "refusal". Added a TODO noting that proper refusal detection requires a structured errorKind field in ChatEventSchema (which does not currently exist)
  • What did NOT change: "final""end_turn"/"max_tokens" mapping, "aborted""cancelled" mapping, delta event handling

Why not detect refusals heuristically?

The gateway error event schema (ChatEventSchema) only carries errorMessage (free text from formatForLog(error)). There are no structured fields like reason or errorCode — the schema enforces additionalProperties: false. Heuristic substring matching on free text is unreliable and would reintroduce the false-positive problem this PR fixes. The correct path is adding a structured errorKind field upstream.

Change Type

  • Bug fix

Scope

  • Gateway / orchestration
  • API / contracts

Linked Issue/PR

User-visible / Behavior Changes

  • Gateway errors now report end_turn instead of refusal to ACP clients

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS 15.4 (arm64)
  • Runtime: Node v24.13.0

Steps

  1. Start an ACP session via openclaw acp
  2. Trigger a gateway error (e.g., kill gateway mid-request, misconfigure model)
  3. Observe the stop reason reported to the client

Expected

  • Client receives end_turn

Actual (before fix)

  • Client receives refusal

Evidence

  • Failing test/log before + passing after
  • New test file: src/acp/translator.stop-reason.test.ts (3 tests: error→end_turn, bare error→end_turn, aborted→cancelled regression guard)
  • All 124 existing ACP tests pass with no regressions

Human Verification

  • Verified: All ACP unit tests pass (pnpm vitest run src/acp/ — 124 tests, 16 files)
  • Verified: ChatEventSchema uses additionalProperties: false — structured error fields cannot exist on error events, confirming heuristic matching would be dead code
  • Not verified: End-to-end with a live ACP client (Zed)

AI-Assisted 🤖

  • Initial code generation via Codex CLI, significantly revised during human review
  • Testing: fully tested, 3 focused unit tests

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes — changes stop reason for errors from refusal to end_turn. Clients should handle all stop reason values.
  • Config/env changes? No
  • Migration needed? No

Failure Recovery

  • Revert this single commit

Risks and Mitigations

  • Risk: If a genuine content-filter refusal surfaces through the error path, it will now report end_turn instead of refusal
    • Mitigation: The gateway does not currently emit structured refusal signals on error events. When ChatEventSchema gains an errorKind field, this mapping should be updated (noted in TODO comment). Until then, end_turn is strictly more correct than the unconditional refusal that was there before.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR correctly fixes the semantic mismatch where all gateway "error" states were unconditionally mapped to the ACP "refusal" stop reason. The new isRefusalErrorPayload() helper ensures only genuine content-filter/safety refusals produce "refusal", while transient failures (timeouts, API crashes) now produce the more appropriate "end_turn".

Key observations:

  • The core logic change is sound and well-motivated.
  • The isRefusalErrorPayload function checks 10 payload paths, which is a pragmatic approach given that gateway error payloads are not formally typed.
  • One concern: The function checks free-text fields (payload.errorMessage, error?.message) for the broad keyword "safety". This can cause false positives — e.g., a gateway error message like "Type safety violation", "Memory safety check failed", or "safety subsystem unavailable" would all resolve to "refusal" instead of "end_turn", directly contradicting the PR's stated preference for erring toward end_turn in ambiguous cases. Restricting the "safety" hint to structured/semantic fields (reason, code, etc.) would eliminate this class of false positives without meaningfully reducing true-positive coverage.
  • The test suite covers the three main scenarios well but is missing a test that exercises the free-text errorMessage + "safety" false-positive path.

Confidence Score: 3/5

  • Safe to merge with the core logic fix, but the broad "safety" keyword check on free-text message fields can reintroduce a variant of the false-positive problem it set out to solve.
  • The fix is correct in intent and the structured-field checks are solid. The risk of false positives from checking errorMessage/error.message for the generic word "safety" is real and addressable, which keeps confidence from being higher. The failure mode (a server error being reported as "refusal" to the user) is the exact UX problem this PR is trying to fix.
  • Pay close attention to the isRefusalErrorPayload function in src/acp/translator.ts, specifically the containsRefusalErrorHint(payload.errorMessage) and containsRefusalErrorHint(error?.message) checks with the "safety" hint.

Last reviewed commit: e51a4a5

@pejmanjohn pejmanjohn force-pushed the fix/acp-stop-reason-mapping branch from e51a4a5 to f4653eb Compare March 9, 2026 15:49
@pejmanjohn
Copy link
Copy Markdown
Contributor Author

Addressed @greptile-apps feedback on free-text false positives, plus additional hardening:

Two-tier hint matching: Restructured isRefusalErrorPayload to distinguish between structured fields and free-text fields:

  • Structured fields (reason, code, errorCode, error.type, etc.): Check all hints — content_filter, safety, refusal
  • Free-text fields (errorMessage, error.message): Only check content_filter — the only truly unambiguous signal

This prevents false positives like "Memory safety check failed" or "Type safety violation" from triggering refusal. We also excluded "refusal" from free-text to guard against edge cases like "Connection refusal timeout".

Additional changes beyond Greptile feedback:

  • Added diagnostic logging: error state mapped to <reason> for session <key>
  • Added 4 new test cases (7 total): bare error payload, free-text "safety" false positive guard, structured "safety" → refusal, free-text "content_filter" → refusal

All 128 ACP tests pass.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 9, 2026

Excellent work! The implementation exactly matches your description:

Verified:

  • ✅ Two-tier hint matching: REFUSAL_HINTS_STRUCTURED vs REFUSAL_HINTS_FREETEXT arrays (lines 64-65)
  • ✅ 8 structured fields check all hints (reason, errorReason, stopReason, code, errorCode, error.reason, error.type, error.code)
  • ✅ 2 free-text fields (errorMessage, error.message) only check content_filter
  • ✅ Diagnostic logging at line 517: error state mapped to <reason> for session <key>
  • ✅ All 7 test cases present, including the false-positive guards

The comment on lines 92-93 clearly documents the rationale: "'safety' and 'refusal' excluded to avoid non-refusal false positives". This prevents exactly the scenario you mentioned: "Memory safety check failed" won't trigger a false refusal.

The structured/free-text distinction is a smart design choice — it maximizes coverage for semantic fields while preventing incidental keyword matches in error strings. Well done!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4653ebedc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@pejmanjohn pejmanjohn force-pushed the fix/acp-stop-reason-mapping branch from f4653eb to 1b30b59 Compare March 9, 2026 16:01
@pejmanjohn
Copy link
Copy Markdown
Contributor Author

Significantly revised after deeper investigation. @greptile-apps

What changed: Removed the isRefusalErrorPayload heuristic entirely. Investigating the ChatEventSchema revealed that error events only carry errorMessage (free text) — there are no structured fields like reason or errorCode (additionalProperties: false). The heuristic was checking 10 fields that cannot exist on real gateway error events, plus unreliable substring matching on free text.

New approach: Error states simply map to end_turn. A TODO comment notes that proper refusal detection requires adding a structured errorKind field to the schema upstream.

The diff from main is now one behavioral line ("refusal""end_turn") plus a comment explaining why and what the real fix would be.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 9, 2026

Good call on the investigation. You're absolutely right — ChatEventSchema only has errorMessage with additionalProperties: false, so all those structured fields (reason, errorCode, error.type, etc.) can't exist on actual gateway events. The heuristic was dead code.

Mapping all errors to end_turn is strictly more correct than the unconditional refusal that was there before. The TODO comment properly documents that the real fix requires adding a structured errorKind field upstream.

The approach is sound. ✅

@pejmanjohn pejmanjohn force-pushed the fix/acp-stop-reason-mapping branch from 7fe2c19 to 16cae36 Compare March 9, 2026 16:11
@onutc onutc self-assigned this Mar 9, 2026
@onutc onutc force-pushed the fix/acp-stop-reason-mapping branch from 16cae36 to b1e23ba Compare March 9, 2026 16:37
@onutc onutc merged commit eab39c7 into openclaw:main Mar 9, 2026
2 of 3 checks passed
@onutc
Copy link
Copy Markdown
Contributor

onutc commented Mar 9, 2026

Landed via temp rebase onto main.

  • Gate: pnpm lint && pnpm build && pnpm test
  • Land commit: b1e23ba
  • Merge commit: eab39c7

Thanks @pejmanjohn!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1e23baac9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// do not treat transient backend errors (timeouts, rate-limits) as deliberate
// refusals. TODO: when ChatEventSchema gains a structured errorKind field
// (e.g. "refusal" | "timeout" | "rate_limit"), use it to distinguish here.
this.finishPrompt(pending.sessionId, pending, "end_turn");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve failure signal for chat error events

Mapping state: "error" to stopReason: "end_turn" turns backend failures into what ACP clients interpret as a normal completion, so timeouts/rate-limit/API failures can no longer be distinguished from successful turns. The gateway explicitly emits state: "error" for failed runs (with errorMessage), but this branch now resolves the prompt as success and drops that error context, which means clients lose retry/error-handling signals and can silently continue as if the agent finished normally.

Useful? React with 👍 / 👎.

vincentkoc pushed a commit that referenced this pull request Mar 9, 2026
…al (#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
freerk added a commit to freerk/openclaw-speedtrap that referenced this pull request Mar 9, 2026
* main: (123 commits)
  acp: fail honestly in bridge mode (openclaw#41424)
  Gateway: tighten node pending drain semantics (openclaw#41429)
  Gateway: add pending node work primitives (openclaw#41409)
  fix(auth): reset cooldown error counters on expiry to prevent infinite escalation (openclaw#41028)
  fix(cron): do not misclassify empty/NO_REPLY as interim acknowledgement (openclaw#41401)
  iOS: reconnect gateway on foreground return (openclaw#41384)
  Doctor: fix non-interactive cron repair gating (openclaw#41386)
  Agents: add embedded error observations (openclaw#41336)
  Cron: enforce cron-owned delivery contract (openclaw#40998)
  fix(telegram): bridge direct delivery to internal message:sent hooks (openclaw#40185)
  plugins: harden global hook runner state (openclaw#40184)
  fix(acp): propagate setSessionMode gateway errors to client (openclaw#41185)
  fix(acp): map error states to end_turn instead of unconditional refusal (openclaw#41187)
  Update CONTRIBUTING.md
  Add Robin Waslander to maintainers
  Update CONTRIBUTING.md
  Allow ACP sessions.patch lineage fields on ACP session keys (openclaw#40995)
  fix(agents): bound compaction retry wait and drain embedded runs on restart (openclaw#40324)
  test(context-engine): add bundle chunk isolation tests for registry (openclaw#40460)
  fix(swiftformat): exclude HostEnvSecurityPolicy.generated.swift from formatters (openclaw#39969)
  ...
ademczuk pushed a commit to ademczuk/openclaw that referenced this pull request Mar 10, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
mukhtharcm pushed a commit to hnykda/openclaw that referenced this pull request Mar 10, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
aiwatching pushed a commit to aiwatching/openclaw that referenced this pull request Mar 10, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
Moshiii pushed a commit to Moshiii/openclaw that referenced this pull request Mar 11, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
Moshiii pushed a commit to Moshiii/openclaw that referenced this pull request Mar 11, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
dominicnunez pushed a commit to dominicnunez/openclaw that referenced this pull request Mar 11, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
dhoman pushed a commit to dhoman/chrono-claw that referenced this pull request Mar 11, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
Ruijie-Ysp pushed a commit to Ruijie-Ysp/clawdbot that referenced this pull request Mar 12, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
qipyle pushed a commit to qipyle/openclaw that referenced this pull request Mar 12, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
GGzili pushed a commit to GGzili/moltbot that referenced this pull request Mar 12, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
Taskle pushed a commit to Taskle/openclaw that referenced this pull request Mar 14, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
senw-developers pushed a commit to senw-developers/va-openclaw that referenced this pull request Mar 17, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
(cherry picked from commit eab39c7)
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
(cherry picked from commit eab39c7)
dustin-olenslager pushed a commit to dustin-olenslager/ironclaw-supreme that referenced this pull request Mar 24, 2026
…al (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
0x666c6f added a commit to 0x666c6f/openclaw that referenced this pull request Mar 26, 2026
* refactor: share Apple talk config parsing

* refactor: add canonical talk config payload

* refactor: centralize talk silence timeout defaults

* test: cover invalid talk config inputs

* test: decouple ios talk parsing coverage

* fix: resolve live config paths in status and gateway metadata (openclaw#39952)

* fix: resolve live config paths in status and gateway metadata

* fix: resolve remaining runtime config path references

* test: cover gateway config.set config path response

* fix(web-search): restore OpenRouter compatibility for Perplexity (openclaw#39937) (openclaw#39937)

* Zalo: fix provider lifecycle restarts (openclaw#39892)

* Zalo: fix provider lifecycle restarts

* Zalo: add typing indicators, smart webhook cleanup, and API type fixes

* fix review

* add allow list test secrect

* Zalo: bound webhook cleanup during shutdown

* Zalo: bound typing chat action timeout

* Zalo: use plugin-safe abort helper import

* fix(plugins): ship Feishu bundled runtime dependency (openclaw#39990)

* fix: ship feishu bundled runtime dependency

* test: align feishu bundled dependency specs

* fix(hooks): use resolveAgentIdFromSessionKey in runBeforeReset  (openclaw#39875)

Merged via squash.

Prepared head SHA: 00a2b24
Co-authored-by: rbutera <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* CLI: include commit hash in --version output (openclaw#39712)

* CLI: include commit hash in --version output

* fix(version): harden commit SHA resolution and keep output consistent

* CLI: keep install checks compatible with commit-tagged version output

* fix(cli): include commit hash in root version fast path

* test(cli): allow null commit-hash mocks

* Installer: share version parser across install scripts

* Installer: avoid sourcing helpers from stdin cwd

* CLI: note commit-tagged version output

* CLI: anchor commit hash resolution to module root

* CLI: harden commit hash resolution

* CLI: fix commit hash lookup edge cases

* CLI: prefer live git metadata in dev builds

* CLI: keep git lookup inside package root

* Infra: tolerate invalid moduleUrl hints

* CLI: cache baked commit metadata fallbacks

* CLI: align changelog attribution with prep gate

* CLI: restore changelog contributor credit

---------

Co-authored-by: echoVic <[email protected]>
Co-authored-by: echoVic <[email protected]>

* fix: fail closed talk provider selection

* fix: align talk config secret schemas

* refactor: require canonical talk resolved payload

* test: add talk config contract fixtures

* refactor: split talk gateway config loaders

* refactor: avoid checkout during prep head verification

* refactor: dedupe prep branch push flow

* fix: treat model api drift as baseUrl refresh

* refactor: extract pure models config merge helpers

* refactor: expand provider capability registry

* refactor: reuse one models.json read per write

* fix: publish models.json atomically

* refactor: scope prep push results to env artifacts

* refactor: fold implicit provider injection into resolver

* fix(telegram): use message previews in DMs

* CI: scope CodeQL JavaScript analysis

* feat: allow compaction model override via config (openclaw#38753)

Merged via squash.

Prepared head SHA: a3d6d6c
Co-authored-by: starbuck100 <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman

* fix(sessions): clear stale contextTokens on model switch (openclaw#38044)

Merged via squash.

Prepared head SHA: bac2df4
Co-authored-by: yuweuii <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman

* fix: prefer bundled channel plugins over npm duplicates (openclaw#40094)

* fix: prefer bundled channel plugins over npm duplicates

* fix: tighten bundled plugin review follow-ups

* fix: address check gate follow-ups

* docs: add changelog for bundled plugin install fix

* fix: align lifecycle test formatting with CI oxfmt

* docs: update Brave Search API docs for Feb 2026 plan restructuring (openclaw#40111)

Merged via squash.

Prepared head SHA: c651f07
Co-authored-by: remusao <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras

* Add too-many-prs override label handling

* CI: satisfy provider merge fixture typing

* Tests: reduce web search secret-scan noise

* Web search: rename Perplexity auth source helper

* Docs: use placeholder OpenRouter key in Perplexity guide

* Docs: use placeholder OpenRouter key in web tool docs

* Fixtures: normalize talk config API key placeholder

* Tests: lower entropy git commit fixtures

* Chore: widen xxxxx detect-secrets allowlist

* Chore: refresh detect-secrets baseline

* Web search: allowlist Perplexity auth source type name

* Chore: refresh detect-secrets baseline after docs line changes

* Chore: refresh detect-secrets baseline after final scan

* Chore: refresh detect-secrets baseline for Feishu docs

* CLI: set local gateway mode in setup

* Tests: format daemon lifecycle CLI coverage

* refactor: use model compat for anthropic tool payload normalization

* refactor: move bundled extension gap allowlists into manifests

* refactor: thread config runtime env through models config

* refactor: split models registry loading from persistence

* refactor: centralize transcript provider quirks

* refactor: decompose implicit provider resolution

* refactor: extract openai stream wrappers

* refactor: validate bundled extension release metadata

* refactor: extract static provider builders

* refactor: extract provider stream wrappers

* refactor: extract bundled extension manifest parser

* test: standardize hermetic provider env snapshots

* test: add implicit provider matrix coverage

* fix(acp): persist spawned child session history (openclaw#40137)

Merged via squash.

Prepared head SHA: 62de5d5
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* fix: require talk resolved payload

* refactor: dedupe android talk config parsing

* test: expand talk config contract fixtures

* refactor: split android talk voice resolution

* test: isolate plugin loader from mocked module cache

* test: isolate legacy plugin-sdk root import check

* test: isolate git commit resolution fallbacks

* refactor: simplify plugin sdk compatibility aliases

* refactor: centralize acp session resolution guards

* refactor: neutralize context engine runtime bridge

* refactor: split doctor config analysis helpers

* refactor: extract ios watch reply coordinator

* fix: restore acp session meta narrowing

* refactor: extract qmd process runner

* fix: restore gate after rebase

* docs: add Browserbase as hosted remote CDP option

Add Browserbase documentation section alongside the existing Browserless
section in the browser docs. Includes signup instructions, CDP connection
configuration, and environment variable setup for both English and Chinese
(zh-CN) translations.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* Revert "docs: add Browserbase as hosted remote CDP option"

This reverts commit c469657.

* docs: add Browserbase as hosted remote CDP option

Add Browserbase documentation section alongside the existing Browserless
section in the browser docs. Includes signup instructions, CDP connection
configuration, and environment variable setup.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* docs: fix duplicate heading lint error

Rename "Configuration" sub-heading to "Profile setup" to avoid
MD024/no-duplicate-heading conflict with the existing top-level
"Configuration" heading.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* docs: fix Browserbase section to match official docs

Browserbase requires creating a session via their API to get a CDP
connect URL, unlike Browserless which uses a static endpoint. Updated
to show the correct curl-based session creation flow, removed
unverified static WebSocket URL, and added the 5-minute connect
timeout note from official docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* docs: restore direct wss://connect.browserbase.com URL

Browserbase exposes a direct WebSocket connect endpoint that
auto-creates a session, similar to how Browserless works. Simplified
the section to use this static URL pattern instead of requiring
manual session creation via the API.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* docs: fact-check Browserbase section against official docs

- Fix CAPTCHA/stealth/proxy claims: these are Developer plan+ only,
  not available on free tier
- Fix free tier limits: 1 browser hour, 15-min session duration
  (not "60 minutes of monthly usage")
- Add link to pricing page for paid plan details
- Simplify structure to match Browserless section format
- Remove sub-headings to match Browserless section style

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* docs: simplify Browserbase section, drop pricing details

Restore platform-level feature description (CAPTCHA solving, stealth
mode, proxies) without plan-specific pricing gating. Keep free tier
note brief.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* feat(browser): support direct WebSocket CDP URLs for Browserbase

Browserbase uses direct WebSocket connections (wss://) rather than the
standard HTTP-based /json/version CDP discovery flow used by Browserless.
This change teaches the browser tool to accept ws:// and wss:// URLs as
cdpUrl values: when a WebSocket URL is detected, OpenClaw connects
directly instead of attempting HTTP discovery.

Changes:
- config.ts: accept ws:// and wss:// in cdpUrl validation
- cdp.helpers.ts: add isWebSocketUrl() helper
- cdp.ts: skip /json/version when cdpUrl is already a WebSocket URL
- chrome.ts: probe WSS endpoints via WebSocket handshake instead of HTTP
- cdp.test.ts: add test for direct WebSocket target creation
- docs/tools/browser.md: update Browserbase section with correct URL
  format and notes

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* test+docs: comprehensive coverage and generic framing

- Add 12 new tests covering: isWebSocketUrl detection, parseHttpUrl WSS
  acceptance/rejection, direct WS target creation with query params,
  SSRF enforcement on WS URLs, WS reachability probing bypasses HTTP
- Reframe docs section as generic "Direct WebSocket CDP providers" with
  Browserbase as one example — any WSS-based provider works
- Update security tips to mention WSS alongside HTTPS

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(browser): update existing tests for ws/wss protocol support

Two pre-existing tests still expected ws:// URLs to be rejected by
parseHttpUrl, which now accepts them. Switch the invalid-protocol
fixture to ftp:// and tighten the assertion to match the full
"must be http(s) or ws(s)" error message.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(browser): preserve wss:// cdpUrl in legacy default profile resolution

* chore: remove vendor-specific references from code comments

* style(browser): fix oxfmt formatting in config.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix: preserve loopback ws cdp tab ops (openclaw#31085) (thanks @shrey150)

* fix: share context engine registry across bundled chunks (openclaw#40115)

Merged via squash.

Prepared head SHA: 6af4820
Co-authored-by: jalehman <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman

* fix(browser): rewrite 0.0.0.0 and [::] wildcard addresses in CDP WebSocket URLs

Containerized browsers (e.g. browserless in Docker) report
`ws://0.0.0.0:<internal-port>` in their `/json/version` response.
`normalizeCdpWsUrl` rewrites loopback WS hosts to the external
CDP host:port, but `0.0.0.0` and `[::]` were not treated as
addresses needing rewriting, causing OpenClaw to try connecting
to `ws://0.0.0.0:3000` literally — which always fails.

Fixes openclaw#17752

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix: normalize wildcard remote CDP websocket URLs (openclaw#17760) (thanks @joeharouni)

* fix(browser): wait for extension tabs after relay drop (openclaw#32331)

* fix: wait for extension relay tab reconnects (openclaw#32461) (thanks @AaronWander)

* fix(infra): make browser relay bind address configurable

Add browser.relayBindHost config option so the Chrome extension relay
server can bind to a non-loopback address (e.g. 0.0.0.0 for WSL2).
Defaults to 127.0.0.1 when unset, preserving current behavior.

Closes openclaw#39214

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(browser): add IP validation, fix upgrade handler for non-loopback bind

- Zod schema: validate relayBindHost with ipv4/ipv6 instead of bare string
- Upgrade handler: allow non-loopback connections when bindHost is explicitly
  non-loopback (e.g. 0.0.0.0 for WSL2), keeping loopback-only default
- Test: verify actual bind address via relay.bindHost instead of just checking
  reachability on 127.0.0.1 which passes regardless
- Expose bindHost on ChromeExtensionRelayServer type for inspection

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix: make browser relay bind address configurable (openclaw#39364) (thanks @mvanhorn)

* docs: add WSL2 + Windows remote Chrome CDP troubleshooting (openclaw#39407) (thanks @Owlock)

* macos: add remote gateway token field for remote mode

* macos: clarify remote token placeholder text

* macos: add mode-toggle remote token sync coverage

* tests: document remote token persistence across mode toggle

* fix(macos): preserve unsupported remote gateway tokens

* docs(changelog): credit macos remote token author

* fix(macos): improve tailscale gateway discovery (openclaw#40167)

Sanitized test tailnet hostnames and re-ran the targeted macOS gateway discovery test suite before merge.

* refactor(browser): scope CDP sessions and harden stale target recovery

* feat: add local backup CLI (openclaw#40163)

Merged via squash.

Prepared head SHA: ed46625
Co-authored-by: shichangs <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras

* fix(ci): scope secrets scan to branch changes

* fix(ci): refresh detect-secrets baseline

* fix: harden backup verify path validation

* fix(plugin-sdk): lazily load legacy root alias

* fix(setup-podman): cd to TMPDIR before podman load to avoid cwd permission error (openclaw#39435)

* fix(setup-podman): cd to TMPDIR before podman load to avoid inherited cwd permission error

* fix(podman): safe cwd in run_as_user to prevent chdir errors

Co-Authored-By: Claude Opus 4.6  <[email protected]>
Signed-off-by: sallyom <[email protected]>

---------

Signed-off-by: sallyom <[email protected]>
Co-authored-by: sallyom <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

* fix(cron): consolidate announce delivery, fire-and-forget trigger, and minimal prompt mode (openclaw#40204)

* fix(cron): consolidate announce delivery and detach manual runs

* fix: queue detached cron runs (openclaw#40204)

* Gateway/iOS: replay queued foreground actions safely after resume (openclaw#40281)

Merged via squash.

- Local validation: `pnpm exec vitest run --config vitest.gateway.config.ts src/gateway/server-methods/nodes.invoke-wake.test.ts`
- Local validation: `pnpm build`
- mb-server validation: `pnpm exec vitest run --config vitest.gateway.config.ts src/gateway/server-methods/nodes.invoke-wake.test.ts`
- mb-server validation: `pnpm build`
- mb-server validation: `pnpm protocol:check`

* iOS: auto-load the scoped gateway canvas with safe fallback (openclaw#40282)

Merged via squash.

- mb-server validation: `swift test --package-path apps/shared/OpenClawKit --filter GatewayNodeSessionTests`
- mb-server validation: `pnpm build`
- Scope note: top-level `RootTabs` shell change was intentionally removed from this PR before merge

* Update CHANGELOG.md

* Update CHANGELOG.md

* fix(run-openclaw-podman): add SELinux :Z mount option on enforcing/permissive hosts (openclaw#39449)

* fix(run-openclaw-podman): add SELinux :Z mount option on Linux with enforcing/permissive SELinux

* fix(quadlet): add SELinux :Z label to openclaw.container.in volume mount

* fix(podman): add SELinux :Z mount option for Fedora/RHEL hosts

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: sallyom <[email protected]>

---------

Signed-off-by: sallyom <[email protected]>
Co-authored-by: sallyom <[email protected]>
Co-authored-by: Claude Opus 4.6 <[email protected]>

* Docker: trim runtime image payload (openclaw#40307)

* Docker: shrink runtime image payload

* Docker: add runtime pnpm opt-in

* Docker: collapse helper entrypoint chmod layers

* Docker: restore bundled pnpm runtime

* Update CHANGELOG.md

* docs(changelog): move post-2026.3.8 entries to unreleased (openclaw#40342)

* docs(changelog): move post-2026.3.8 entries to unreleased

* Update CHANGELOG.md

* fix(tui): improve color contrast for light-background terminals (openclaw#40345)

* fix(tui): improve colour contrast for light-background terminals (openclaw#38636)

Detect light terminal backgrounds via COLORFGBG and apply a WCAG
AA-compliant light palette. Adds OPENCLAW_THEME=light|dark env var
override for terminals without auto-detection.

Uses proper sRGB linearisation and WCAG 2.1 contrast ratios to pick
whichever text palette (dark or light) has higher contrast against
the detected background colour.

Co-authored-by: ademczuk <[email protected]>

* Update CHANGELOG.md

---------

Co-authored-by: ademczuk <[email protected]>
Co-authored-by: ademczuk <[email protected]>

* fix(models): keep --all aligned with synthetic catalog rows

* test(models): refresh list assertions after main sync

* fix: normalize openai-codex gpt-5.4 transport overrides

* docs: add refactor cluster backlog

* refactor: dedupe plugin runtime stores

* refactor: share gateway argv parsing

* refactor: extract gateway port diagnostics helper

* refactor: reuse broadcast route key construction

* refactor: share multi-account config schema fragments

* test: dedupe brave llm-context rejection cases

* refactor: share channel config adapter base

* fix(agents): bootstrap runtime plugins before context-engine resolution

* docs(changelog): remove rebase marker

* refactor: harden browser relay CDP flows

* fix(config): refresh runtime snapshot from disk after write. Fixes openclaw#37175 (openclaw#37313)

Merged via squash.

Prepared head SHA: 69e1861
Co-authored-by: bbblending <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras

* refactor: harden browser runtime profile handling

* refactor(models): extract list row builders

* refactor(agents): extract provider model normalization

* refactor(models): split models.json planning from writes

* refactor(models): split provider discovery helpers

* chore(docs): drop refactor cleanup tracker

* gateway: fix global Control UI 404s for symlinked wrappers and bundled package roots (openclaw#40385)

Merged via squash.

Prepared head SHA: 567b3ed
Co-authored-by: velvet-shark <[email protected]>
Co-authored-by: velvet-shark <[email protected]>
Reviewed-by: @velvet-shark

* Docker: improve build cache reuse (openclaw#40351)

* Docker: improve build cache reuse

* Tests: cover Docker build cache layout

* Docker: fix sandbox cache mount continuations

* Docker: document qr-import manifest scope

* Docker: narrow e2e install inputs

* CI: cache Docker builds in workflows

* CI: route sandbox smoke through setup script

* CI: keep sandbox smoke on script path

* fix(tests): correct security check failure

* docs(changelog): correct Control UI contributor credit (openclaw#40420)

Merged via squash.

Prepared head SHA: e4295fe
Co-authored-by: velvet-shark <[email protected]>
Co-authored-by: velvet-shark <[email protected]>
Reviewed-by: @velvet-shark

* fix(models): use 1M context for openai-codex gpt-5.4 (openclaw#37876)

Merged via squash.

Prepared head SHA: c410207
Co-authored-by: yuweuii <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman

* fix(telegram): add download timeout to prevent polling loop hang (openclaw#40098)

Merged via squash.

Prepared head SHA: abdfa1a
Co-authored-by: tysoncung <[email protected]>
Co-authored-by: obviyus <[email protected]>
Reviewed-by: @obviyus

* ACP: add optional ingress provenance receipts (openclaw#40473)

Merged via squash.

Prepared head SHA: b63e46d
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* alphabetize web search providers (openclaw#40259)

Merged via squash.

Prepared head SHA: be6350e
Co-authored-by: kesku <[email protected]>
Co-authored-by: obviyus <[email protected]>
Reviewed-by: @obviyus

* fix(plugin-sdk): remove remaining bundled plugin src imports (openclaw#39638)

Verified:
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: Kyle <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* test: fix android talk config contract fixture

* chore(acpx): move runtime test fixtures to test-utils (openclaw#40548)

Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini

* fix(agents): re-expose configured tools under restrictive profiles

* fix(media): accept reader read result type

* build(protocol): sync generated swift models

* fix: dedupe inbound Telegram DM replies per agent (openclaw#40519)

Merged via squash.

Prepared head SHA: 6e235e7
Co-authored-by: obviyus <[email protected]>
Co-authored-by: obviyus <[email protected]>
Reviewed-by: @obviyus

* fix(matrix): restore robust DM routing without the memberCount heuristic (openclaw#19736)

* fix(matrix): remove memberCount heuristic from DM detection

The memberCount === 2 check in isDirectMessage() misclassifies 2-person
group rooms (admin channels, monitoring rooms) as DMs, routing them to
the main session instead of their room-specific session.

Matrix already distinguishes DMs from groups at the protocol level via
m.direct account data and is_direct member state flags. Both are already
checked by client.dms.isDm() and hasDirectFlag(). The memberCount
heuristic only adds false positives for 2-person groups.

Move resolveMemberCount() below the protocol-level checks so it is only
reached for rooms not matched by m.direct or is_direct. This narrows its
role to diagnostic logging for confirmed group rooms.

Refs: openclaw#19739

* fix(matrix): add conservative fallback for broken DM flags

Some homeservers (notably Continuwuity) have broken m.direct account
data or never set is_direct on invite events. With the memberCount
heuristic removed, these DMs are no longer detected.

Add a conservative fallback that requires two signals before classifying
as DM: memberCount === 2 AND no explicit m.room.name. Group rooms almost
always have explicit names; DMs almost never do.

Error handling distinguishes M_NOT_FOUND (missing state event, expected
for unnamed rooms) from network/auth errors. Non-404 errors fall through
to group classification rather than guessing.

This is independently revertable — removing this commit restores pure
protocol-based detection without any heuristic fallback.

* fix(matrix): add parentPeer for DM room binding support

Add parentPeer to DM routes so conversations are bindable by room ID
while preserving DM trust semantics (secure 1:1, no group restrictions).

Suggested by @KirillShchetinin.

* fix(matrix): override DM detection for explicitly configured rooms

Builds on @robertcorreiro's config-driven approach from openclaw#9106.

Move resolveMatrixRoomConfig() before the DM check. If a room matches
a non-wildcard config entry (matchSource === "direct") and was
classified as DM, override the classification to group. This gives users
a deterministic escape hatch for misclassified rooms.

Wildcards are excluded from the override to avoid breaking DM routing
when a "*" catch-all exists. roomConfig is gated behind isRoom so DMs
never inherit group settings (skills, systemPrompt, autoReply).

This commit is independently droppable if the scope is too broad.

* test(matrix): add DM detection and config override tests

- 15 unit tests for direct.ts: all detection paths, priority order,
  M_NOT_FOUND vs network error handling, edge cases (whitespace names,
  API failures)
- 8 unit tests for rooms.ts: matchSource classification, wildcard
  safety for DM override, direct match priority over wildcard

* Changelog: note matrix DM routing follow-up

* fix(matrix): preserve DM fallback and room bindings

---------

Co-authored-by: Tak Hoffman <[email protected]>

* Fix cron text announce delivery for Telegram targets (openclaw#40575)

Merged via squash.

Prepared head SHA: 54b1513
Co-authored-by: obviyus <[email protected]>
Co-authored-by: obviyus <[email protected]>
Reviewed-by: @obviyus

* fix: clear plugin discovery cache after plugin installation (openclaw#39752)

Verified:
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: GazeKingNuWu <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* test: fix windows secrets runtime ci

* fix(daemon): enable LaunchAgent before bootstrap on restart

restartLaunchAgent was missing the launchctl enable call that
installLaunchAgent already performs. launchd can persist a "disabled"
state after bootout, causing bootstrap to silently fail and leaving the
gateway unloaded until a manual reinstall.

Fixes openclaw#39211

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(daemon): also enable LaunchAgent in repairLaunchAgentBootstrap

The repair/recovery path had the same missing `enable` guard as
`restartLaunchAgent`.  If launchd persists a "disabled" state after a
previous `bootout`, the `bootstrap` call in `repairLaunchAgentBootstrap`
fails silently, leaving the gateway unloaded in the recovery flow.

Add the same `enable` guard before `bootstrap` that was already applied
to `installLaunchAgent` and (in this PR) `restartLaunchAgent`.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(gateway): exit non-zero on restart shutdown timeout

When a config-change restart hits the force-exit timeout, exit with
code 1 instead of 0 so launchd/systemd treats it as a failure and
triggers a clean process restart. Stop-timeout stays at exit(0)
since graceful stops should not cause supervisor recovery.

Closes openclaw#36822

* test(secrets): skip ACL-dependent runtime snapshot tests on windows

* fix: add changelog for restart timeout recovery (openclaw#40380) (thanks @dsantoreis)

* fix(browser): enforce redirect-hop SSRF checks

* fix(cron): restore owner-only tools for isolated runs

* test(cron): cover owner-only tool availability

* fix(msteams): enforce sender allowlists with route allowlists

* fix(gateway): validate config before restart to prevent crash + macOS permission loss (openclaw#35862)

When 'openclaw gateway restart' is run with an invalid config, the new
process crashes on startup due to config validation failure. On macOS,
this causes Full Disk Access (TCC) permissions to be lost because the
respawned process has a different PID.

Add getConfigValidationError() helper and pre-flight config validation
in both runServiceRestart() and runServiceStart(). If config is invalid,
abort with a clear error message instead of crashing.

The config watcher's hot-reload path already had this guard
(handleInvalidSnapshot), but the CLI restart/start commands did not.

AI-assisted (OpenClaw agent, fully tested)

* fix(gateway): catch startup failure in run loop to prevent process exit (openclaw#35862)

When an in-process restart (SIGUSR1) triggers a config-triggered restart
and the new config is invalid, params.start() throws and the while loop
exits, killing the process. On macOS this loses TCC permissions.

Wrap params.start() in try/catch: on failure, set server=null, log the
error, and wait for the next SIGUSR1 instead of crashing.

* test: add runServiceStart config pre-flight tests (openclaw#35862)

Address Greptile review: add test coverage for runServiceStart path.
The error message copy-paste issue was already fixed in the DRY refactor
(uses params.serviceNoun instead of hardcoded 'restart').

* fix: address bot review feedback on openclaw#35862

- Remove dead 'return false' in runServiceStart (Greptile)
- Include stack trace in run-loop crash guard error log (Greptile)
- Only catch startup errors on subsequent restarts, not initial start (Codex P1)
- Add JSDoc note about env var false positive edge case (Codex P1)

* fix: move config pre-flight before onNotLoaded in runServiceRestart (Codex P2)

The config check was positioned after onNotLoaded, which could send
SIGUSR1 to an unmanaged process before config was validated.

* fix: release gateway lock on restart failure + reply to Codex reviews

- Release gateway lock when in-process restart fails, so daemon
  restart/stop can still manage the process (Codex P2)
- P1 (env mismatch) already addressed: best-effort by design, documented
  in JSDoc

* fix(gateway): detect launchd supervision via XPC_SERVICE_NAME

On macOS, launchd sets XPC_SERVICE_NAME on managed processes but does
not set LAUNCH_JOB_LABEL or LAUNCH_JOB_NAME. Without checking
XPC_SERVICE_NAME, isLikelySupervisedProcess() returns false for
launchd-managed gateways, causing restartGatewayProcessWithFreshPid()
to fork a detached child instead of returning "supervised". The
detached child holds the gateway lock while launchd simultaneously
respawns the original process (KeepAlive=true), leading to an infinite
lock-timeout / restart loop.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix: detect launchd supervision via xpc service name (openclaw#20555) (thanks @dimat)

* fix(node-host): bind bun and deno approval scripts

* fix(skills): pin validated download roots

* fix(telegram): abort in-flight getUpdates fetch on shutdown

When the gateway receives SIGTERM, runner.stop() stops the grammY polling
loop but does not abort the in-flight getUpdates HTTP request. That request
hangs for up to 30 seconds (the Telegram API timeout). If a new gateway
instance starts polling during that window, Telegram returns a 409 Conflict
error, causing message loss and requiring exponential backoff recovery.

This is especially problematic with service managers (launchd, systemd)
that restart the process immediately after SIGTERM.

Wire an AbortController into the fetch layer so every Telegram API request
(especially the long-polling getUpdates) aborts immediately on shutdown:

- bot.ts: Accept optional fetchAbortSignal in TelegramBotOptions; wrap
  the grammY fetch with AbortSignal.any() to merge the shutdown signal.
- monitor.ts: Create a per-iteration AbortController, pass its signal to
  createTelegramBot, and abort it from the SIGTERM handler, force-restart
  path, and finally block.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(telegram): use manual signal forwarding to avoid cross-realm AbortSignal

AbortSignal.any() fails in Node.js when signals come from different module
contexts (grammY's internal signal vs local AbortController), producing:
"The signals[0] argument must be an instance of AbortSignal. Received an
instance of AbortSignal".

Replace with manual event forwarding that works across all realms.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix: abort telegram getupdates on shutdown (openclaw#23950) (thanks @Gkinthecodeland)

* fix(cron): stagger missed jobs on restart to prevent gateway overload

When the gateway restarts with many overdue cron jobs, they are now
executed with staggered delays to prevent overwhelming the gateway.

- Add missedJobStaggerMs config (default 5s between jobs)
- Add maxMissedJobsPerRestart limit (default 5 jobs immediately)
- Prioritize most overdue jobs by sorting by nextRunAtMs
- Reschedule deferred jobs to fire gradually via normal timer

Fixes openclaw#18892

* fix: stagger missed cron jobs on restart (openclaw#18925) (thanks @rexlunae)

* build: update app deps except carbon

* refactor: extract telegram polling session

* refactor: split cron startup catch-up flow

* refactor: flatten supervisor marker hints

* docs: reorder 2026.3.8 changelog by impact

* build: sync pnpm lockfile

* chore: refresh secrets baseline

* docs: move 2026.3.8 entries back to unreleased

* test: fix Node 24+ test runner and subagent registry mocks

* test: fix Windows fake runtime bin fixtures

* fix: normalize windows runtime shim executables

* chore: prepare 2026.3.8-beta.1 release

* chore: update appcast for 2026.3.8-beta.1

* test: fix windows runtime and restart loop harnesses

* fix(update): re-enable launchd service before updater bootstrap

* chore: prepare 2026.3.8 npm release

* test: narrow gateway loop signal harness

* fix(launchd): harden macOS launchagent install permissions

* fix(onboard): avoid persisting talk fallback on fresh setup

* build: bump unreleased version to 2026.3.9

* fix: stabilize launchd paths and appcast secret scan

* build: sync plugin versions for 2026.3.9

* fix(ui): preserve control-ui auth across refresh (openclaw#40892)

Merged via squash.

Prepared head SHA: f9b2375
Co-authored-by: velvet-shark <[email protected]>
Co-authored-by: velvet-shark <[email protected]>
Reviewed-by: @velvet-shark

* fix(kimi-coding): fix kimi tool format: use native Anthropic tool schema instead of OpenAI … (openclaw#40008)

Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: opriz <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* fix(swiftformat): exclude HostEnvSecurityPolicy.generated.swift from formatters (openclaw#39969)

* test(context-engine): add bundle chunk isolation tests for registry (openclaw#40460)

Merged via squash.

Prepared head SHA: 44622ab
Co-authored-by: dsantoreis <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman

* fix(agents): bound compaction retry wait and drain embedded runs on restart (openclaw#40324)

Merged via squash.

Prepared head SHA: cfd9956
Co-authored-by: cgdusek <[email protected]>
Co-authored-by: jalehman <[email protected]>
Reviewed-by: @jalehman

* Allow ACP sessions.patch lineage fields on ACP session keys (openclaw#40995)

Merged via squash.

Prepared head SHA: c1191ed
Co-authored-by: xaeon2026 <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* Update CONTRIBUTING.md

* Add Robin Waslander to maintainers

* Update CONTRIBUTING.md

* fix(acp): map error states to end_turn instead of unconditional refusal (openclaw#41187)

* fix(acp): map error states to end_turn instead of unconditional refusal

* fix: map ACP error stop reason to end_turn (openclaw#41187) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>

* fix(acp): propagate setSessionMode gateway errors to client (openclaw#41185)

* fix(acp): propagate setSessionMode gateway errors to client

* fix: add changelog entry for ACP setSessionMode propagation (openclaw#41185) (thanks @pejmanjohn)

---------

Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>

* plugins: harden global hook runner state (openclaw#40184)

* fix(telegram): bridge direct delivery to internal message:sent hooks (openclaw#40185)

* telegram: bridge direct delivery message hooks

* telegram: align sent hooks with command session

* Cron: enforce cron-owned delivery contract (openclaw#40998)

Merged via squash.

Prepared head SHA: 5877389
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* Agents: add embedded error observations (openclaw#41336)

Merged via squash.

Prepared head SHA: 4900042
Co-authored-by: altaywtf <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* Doctor: fix non-interactive cron repair gating (openclaw#41386)

* iOS: reconnect gateway on foreground return (openclaw#41384)

Merged via squash.

Prepared head SHA: 0e2e0dc
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* fix(cron): do not misclassify empty/NO_REPLY as interim acknowledgement (openclaw#41401)

* fix(cron): do not misclassify empty/NO_REPLY as interim acknowledgement

When a cron task's agent returns NO_REPLY, the payload filter strips the
silent token, leaving an empty text string. isLikelyInterimCronMessage()
previously returned true for empty input, causing the cron runner to
inject a forced rerun prompt ('Your previous response was only an
acknowledgement...').

Change the empty-string branch to return false: empty text after payload
filtering means the agent deliberately chose silent completion, not that
it sent an interim 'on it' message.

Fixes openclaw#41246

* fix(cron): do not misclassify empty/NO_REPLY as interim acknowledgement

Fixes openclaw#41246. (openclaw#41383) thanks @jackal092927.

---------

Co-authored-by: xaeon2026 <[email protected]>

* fix(auth): reset cooldown error counters on expiry to prevent infinite escalation (openclaw#41028)

Merged via squash.

Prepared head SHA: 89bd83f
Co-authored-by: zerone0x <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* Gateway: add pending node work primitives (openclaw#41409)

Merged via squash.

Prepared head SHA: a6d7ca9
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* Gateway: tighten node pending drain semantics (openclaw#41429)

Merged via squash.

Prepared head SHA: 361c2eb
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* acp: fail honestly in bridge mode (openclaw#41424)

Merged via squash.

Prepared head SHA: b5e6e13
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* acp: restore session context and controls (openclaw#41425)

Merged via squash.

Prepared head SHA: fcabdf7
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* Sandbox: import STATE_DIR from paths directly (openclaw#41439)

* acp: enrich streaming updates for ide clients (openclaw#41442)

Merged via squash.

Prepared head SHA: 0764368
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* acp: forward attachments into ACP runtime sessions (openclaw#41427)

Merged via squash.

Prepared head SHA: f2ac51d
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* acp: add regression coverage and smoke-test docs (openclaw#41456)

Merged via squash.

Prepared head SHA: 514d587
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* fix(agents): probe single-provider billing cooldowns (openclaw#41422)

Merged via squash.

Prepared head SHA: bbc4254
Co-authored-by: altaywtf <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* acp: harden follow-up reliability and attachments (openclaw#41464)

Merged via squash.

Prepared head SHA: 7d167df
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* Agents: add fallback error observations (openclaw#41337)

Merged via squash.

Prepared head SHA: 852469c
Co-authored-by: altaywtf <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* build(protocol): regenerate Swift models after pending node work schemas (openclaw#41477)

Merged via squash.

Prepared head SHA: cae0aaf
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky

* fix(discord): apply effective maxLinesPerMessage in live replies (openclaw#40133)

Merged via squash.

Prepared head SHA: 031d032
Co-authored-by: rbutera <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* Logging: harden probe suppression for observations (openclaw#41338)

Merged via squash.

Prepared head SHA: d18356c
Co-authored-by: altaywtf <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Reviewed-by: @altaywtf

* ci(sre:PLA-760): fix smoke workflow fallbacks

* test(sre:PLA-760): allowlist secret-scan false positives

* test(sre:PLA-760): refresh secret baseline for upstream sync

* test(sre:PLA-760): update detect-secrets baseline

* ci(sre:PLA-760): exclude auto-response from zizmor

* fix(ci:PLA-760): unblock audit and bun test lane

* ci(sre:PLA-760): run linux-only ci

---------

Signed-off-by: sallyom <[email protected]>
Co-authored-by: Peter Steinberger <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>
Co-authored-by: Ayaan Zaidi <[email protected]>
Co-authored-by: darkamenosa <[email protected]>
Co-authored-by: Hermione <[email protected]>
Co-authored-by: rbutera <[email protected]>
Co-authored-by: altaywtf <[email protected]>
Co-authored-by: Altay <[email protected]>
Co-authored-by: echoVic <[email protected]>
Co-authored-by: echoVic <[email protected]>
Co-authored-by: Ayaan Zaidi <[email protected]>
Co-authored-by: Vincent Koc <[email protected]>
Co-authored-by: GitBuck <[email protected]>
Co-authored-by: starbuck100 <[email protected]>
Co-authored-by: jalehman <[email protected]>
Co-authored-by: yuweuii <[email protected]>
Co-authored-by: Rémi <[email protected]>
Co-authored-by: remusao <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Co-authored-by: Mariano <[email protected]>
Co-authored-by: Shrey Pandya <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Co-authored-by: Josh Lehman <[email protected]>
Co-authored-by: Joe Harouni <[email protected]>
Co-authored-by: AaronWander <[email protected]>
Co-authored-by: Matt Van Horn <[email protected]>
Co-authored-by: Charles Dusek <[email protected]>
Co-authored-by: Nimrod Gutman <[email protected]>
Co-authored-by: shichangs <[email protected]>
Co-authored-by: Gustavo Madeira Santana <[email protected]>
Co-authored-by: langdon <[email protected]>
Co-authored-by: sallyom <[email protected]>
Co-authored-by: Tyler Yust <[email protected]>
Co-authored-by: langdon <[email protected]>
Co-authored-by: ademczuk <[email protected]>
Co-authored-by: ademczuk <[email protected]>
Co-authored-by: Doruk Ardahan <[email protected]>
Co-authored-by: 0xsline <[email protected]>
Co-authored-by: bbblending <[email protected]>
Co-authored-by: Radek Sienkiewicz <[email protected]>
Co-authored-by: velvet-shark <[email protected]>
Co-authored-by: Tyson Cung <[email protected]>
Co-authored-by: obviyus <[email protected]>
Co-authored-by: Mariano <[email protected]>
Co-authored-by: Kesku <[email protected]>
Co-authored-by: Kyle <[email protected]>
Co-authored-by: Kyle <[email protected]>
Co-authored-by: Bronko <[email protected]>
Co-authored-by: GazeKingNuWu <[email protected]>
Co-authored-by: GazeKingNuWu <[email protected]>
Co-authored-by: scoootscooob <[email protected]>
Co-authored-by: Daniel dos Santos Reis <[email protected]>
Co-authored-by: DevMac <[email protected]>
Co-authored-by: merlin <[email protected]>
Co-authored-by: dimatu <[email protected]>
Co-authored-by: George Kalogirou <[email protected]>
Co-authored-by: rexlunae <[email protected]>
Co-authored-by: opriz <[email protected]>
Co-authored-by: opriz <[email protected]>
Co-authored-by: Joshua Lelon Mitchell <[email protected]>
Co-authored-by: dsantoreis <[email protected]>
Co-authored-by: Charles Dusek <[email protected]>
Co-authored-by: xaeon2026 <[email protected]>
Co-authored-by: xaeon2026 <[email protected]>
Co-authored-by: Robin Waslander <[email protected]>
Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Pejman Pour-Moezzi <[email protected]>
Co-authored-by: Onur <[email protected]>
Co-authored-by: zerone0x <[email protected]>
Co-authored-by: zerone0x <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants