Skip to content

iOS Security Stack 1/5: Keychain Migrations + Tests#33029

Merged
mbelinky merged 8 commits intomainfrom
stack/ios-sec-01-keychain-migrations
Mar 3, 2026
Merged

iOS Security Stack 1/5: Keychain Migrations + Tests#33029
mbelinky merged 8 commits intomainfrom
stack/ios-sec-01-keychain-migrations

Conversation

@mbelinky
Copy link
Copy Markdown
Contributor

@mbelinky mbelinky commented Mar 3, 2026

Part 1/5 of iOS security stack (split from #31979).

Scope

  • Harden Keychain write behavior in iOS Keychain helper
  • Migrate iOS gateway connection metadata from UserDefaults to Keychain
  • Migrate TLS fingerprint pinning storage to Keychain
  • Preserve legacy UserDefaults data when Keychain migration write fails
  • Update iOS tests for Keychain-backed storage

Files

  • apps/ios/Sources/Gateway/KeychainStore.swift
  • apps/ios/Sources/Gateway/GatewaySettingsStore.swift
  • apps/shared/OpenClawKit/Sources/OpenClawKit/GatewayTLSPinning.swift
  • apps/ios/Tests/GatewayConnectionControllerTests.swift
  • apps/ios/Tests/GatewaySettingsStoreTests.swift

Stack

  • Base: main
  • Next PR: Part 2/5 (stack/ios-sec-02-concurrency-locks)

@openclaw-barnacle openclaw-barnacle bot added app: ios App: ios size: M maintainer Maintainer-authored PR labels Mar 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e90bc3d7d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

public func makeWebSocketTask(url: URL) -> WebSocketTaskBox {
let task = self.session.webSocketTask(with: url)
task.maximumMessageSize = 16 * 1024 * 1024
task.maximumMessageSize = 4 * 1024 * 1024
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore 16MB WebSocket receive limit for TLS-pinned sessions

This lowers GatewayTLSPinningSession to a 4MB receive cap, but TLS-enabled connections use this session path and can still receive large snapshot/history frames; the default channel keeps 16MB specifically to avoid "Message too long" failures for those payloads. As a result, secure connections can now fail on responses between 4MB and 16MB that previously worked, creating a regression that only affects TLS/pinned flows.

Useful? React with 👍 / 👎.

let status = SecItemUpdate(query as CFDictionary, update as CFDictionary)
if status == errSecSuccess { return true }
if status != errSecItemNotFound { return false }
SecItemDelete(query as CFDictionary)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid deleting keychain value before replacement succeeds

The new delete-then-add write path can permanently erase an existing secret when SecItemAdd fails (for example, transient keychain availability errors), because the old item is removed first and many call sites ignore the boolean return value. Previously, update-first behavior preserved the existing credential on non-errSecItemNotFound failures, so this change introduces silent credential loss risk for stored gateway/token/password entries.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR successfully migrates iOS gateway connection metadata and TLS fingerprints from UserDefaults to Keychain-backed storage, implements a sound migration path with fallback preservation, and hardens Keychain write semantics through delete-then-add operations. The GatewayDiagnostics logging is well-integrated with write-counting via OSAllocatedUnfairLock.

However, one security gap remains: applyFileProtection is only called during bootstrap(), but file protection can be silently lost when truncateLogIfNeeded() or appendToLog() create new files atomically. This gap contradicts the PR's explicit goal to harden write behavior. The fix is straightforward: call applyFileProtection inside log() after truncation and atomic writes.

Confidence Score: 3/5

  • The PR is mostly safe to merge, but a file-protection security gap in GatewayDiagnostics should be addressed before shipping.
  • The Keychain migration logic, fallback semantics, test isolation, and delete-then-add write hardening are all well-implemented. The score is reduced because applyFileProtection is not reapplied after atomic writes in log() (a gap in the PR's explicit security hardening goal). This allows file protection to be silently lost when truncation or atomic writes occur, contradicting the hardening scope.
  • apps/ios/Sources/Gateway/GatewaySettingsStore.swift — the log() method must call applyFileProtection() after truncation and atomic file creation.

Last reviewed commit: 2e90bc3

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Additional Comments (1)

apps/ios/Sources/Gateway/GatewaySettingsStore.swift
applyFileProtection is only invoked in bootstrap() at line 483. However, truncateLogIfNeeded() uses tail.write(to: url, options: .atomic) at line 451, which creates a brand-new file. The new file starts with iOS's default protection level — not completeUntilFirstUserAuthentication. Since log() can trigger truncation independently (every 50 writes), the protection attribute is silently lost long after bootstrap() ran.

Similarly, appendToLog() at line 465 uses .atomic write when the file doesn't exist, creating an unprotected file if reset() deletes the log and log() is called before bootstrap().

To ensure the protection level is consistently maintained, applyFileProtection should also be called inside log() after truncation and after appendToLog() creates a new file:

if shouldTruncate {
    self.truncateLogIfNeeded(url: url)
    self.applyFileProtection(url: url)
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/ios/Sources/Gateway/GatewaySettingsStore.swift
Line: 469-511

Comment:
`applyFileProtection` is only invoked in `bootstrap()` at line 483. However, `truncateLogIfNeeded()` uses `tail.write(to: url, options: .atomic)` at line 451, which creates a brand-new file. The new file starts with iOS's default protection level — not `completeUntilFirstUserAuthentication`. Since `log()` can trigger truncation independently (every 50 writes), the protection attribute is silently lost long after `bootstrap()` ran.

Similarly, `appendToLog()` at line 465 uses `.atomic` write when the file doesn't exist, creating an unprotected file if `reset()` deletes the log and `log()` is called before `bootstrap()`.

To ensure the protection level is consistently maintained, `applyFileProtection` should also be called inside `log()` after truncation and after `appendToLog()` creates a new file:

```swift
if shouldTruncate {
    self.truncateLogIfNeeded(url: url)
    self.applyFileProtection(url: url)
}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 185c77c665

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +50 to +51
guard GenericPasswordKeychainStore.saveString(existing, service: self.keychainService, account: stableID) else {
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve legacy TLS pin when Keychain migration fails

When migrateFromUserDefaultsIfNeeded cannot write to Keychain, it exits early and loadFingerprint immediately continues by reading only Keychain, returning nil even though a legacy pin still exists in UserDefaults. That downgrades trust behavior for this session (for example, apps/macos/Sources/OpenClaw/NodeMode/MacNodeModeCoordinator.swift enables TOFU when stored == nil), so a transient Keychain write failure can make a previously pinned gateway behave as unpinned.

Useful? React with 👍 / 👎.

Comment on lines +196 to +200
guard let json = KeychainStore.loadString(
service: self.gatewayService, account: self.lastGatewayConnectionAccount),
let data = json.data(using: .utf8),
let stored = try? JSONDecoder().decode(LastGatewayConnectionData.self, from: data)
else { return nil }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to legacy last-connection data on migration failure

loadLastGatewayConnection now always decodes from Keychain after calling migration, but migrateLastGatewayFromUserDefaultsIfNeeded leaves legacy defaults in place when saveLastGatewayConnectionData fails. In that failure path this method still returns nil instead of reading the existing gateway.last.* values, which can drop reconnect/onboarding state until a later successful Keychain write.

Useful? React with 👍 / 👎.

@mbelinky mbelinky force-pushed the stack/ios-sec-01-keychain-migrations branch from 92e3ee0 to 8d538be Compare March 3, 2026 15:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d538be268

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

private static let logSizeCheckEveryWrites = 50
nonisolated(unsafe) private static var logWritesSinceCheck = 0
private static let logWritesSinceCheck = OSAllocatedUnfairLock(initialState: 0)
private static let isoFormatter: ISO8601DateFormatter = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid shared ISO8601 formatter in concurrent logging

GatewayDiagnostics.log can be called from multiple threads, but this change introduced a single static ISO8601DateFormatter that is used before work is queued onto the serial logger queue. Because ISO8601DateFormatter is not Sendable/thread-safe, concurrent calls can race while formatting timestamps, which can lead to intermittent crashes or corrupted log timestamps under parallel gateway/talk events.

Useful? React with 👍 / 👎.

/// On first Keychain read for a given stableID, move any legacy UserDefaults
/// fingerprint into Keychain and remove the old entry.
private static func migrateFromUserDefaultsIfNeeded(stableID: String) {
guard let defaults = UserDefaults(suiteName: self.legacySuiteName) else { return }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve standard-defaults fallback in TLS pin migration

Legacy TLS pins were previously read from UserDefaults(suiteName: ...) ?? .standard, but migration now returns immediately when the suite cannot be opened. In environments where that suite is unavailable, existing pins stored in .standard are never migrated or read, so loadFingerprint starts returning nil and previously pinned gateways are treated as unpinned after upgrade.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle bot added the channel: discord Channel integration: discord label Mar 3, 2026
@mbelinky mbelinky force-pushed the stack/ios-sec-01-keychain-migrations branch from 0b88928 to 8e98cde Compare March 3, 2026 16:10
@mbelinky mbelinky force-pushed the stack/ios-sec-01-keychain-migrations branch from 8e98cde to b37f3a2 Compare March 3, 2026 16:13
@mbelinky mbelinky force-pushed the stack/ios-sec-01-keychain-migrations branch from b37f3a2 to da2f8f6 Compare March 3, 2026 16:14
@mbelinky mbelinky merged commit ec0eb9f into main Mar 3, 2026
10 checks passed
@mbelinky mbelinky deleted the stack/ios-sec-01-keychain-migrations branch March 3, 2026 16:15
@mbelinky
Copy link
Copy Markdown
Contributor Author

mbelinky commented Mar 3, 2026

Merged via squash.

Thanks @mbelinky!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da2f8f6141

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +65 to +66
_ = SecItemDelete(query as CFDictionary)
_ = SecItemAdd(rollback as CFDictionary, nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent rollback from clobbering concurrent keychain writes

If two saveString calls race on the same (service, account), one SecItemAdd can fail with a duplicate after the other writer has already inserted the new value. This rollback path then deletes the currently stored item and restores previousData, which can revert the key to stale credentials instead of leaving the latest successful write in place. The regression is specific to overlapping writes, but it can silently persist old gateway secrets when contention happens.

Useful? React with 👍 / 👎.

@mbelinky
Copy link
Copy Markdown
Contributor Author

mbelinky commented Mar 3, 2026

Retroactive attribution note: this merged slice includes substantial original iOS keychain migration/hardening work authored by @Rocuts in the PR commit history. Thank you @Rocuts for the core implementation.

dawi369 pushed a commit to dawi369/davis that referenced this pull request Mar 3, 2026
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: da2f8f6
Co-authored-by: mbelinky <[email protected]>
Co-authored-by: mbelinky <[email protected]>
Reviewed-by: @mbelinky
MillionthOdin16 added a commit to MillionthOdin16/openclaw that referenced this pull request Mar 3, 2026
* refactor(media): add shared ffmpeg helpers

* fix(discord): harden voice ffmpeg path and opus fast-path

* fix(ui): ensure GFM tables render in WebChat markdown (#20410)

- Pass gfm:true + breaks:true explicitly to marked.parse() so table
  support is guaranteed even if global setOptions() is bypassed or
  reset by a future refactor (defense-in-depth)
- Add display:block + overflow-x:auto to .chat-text table so wide
  multi-column tables scroll horizontally instead of being clipped
  by the parent overflow-x:hidden chat container
- Add regression tests for GFM table rendering in markdown.test.ts

* fix: webchat gfm table rendering and overflow (#32365) (thanks @BlueBirdBack)

* refactor(tests): dedupe pi embedded test harness

* refactor(tests): dedupe browser and config cli test setup

* fix(test): harden discord lifecycle status sink typing

* fix(gateway): harden message action channel fallback and startup grace

Take the safe, tested subset from #32367:\n- per-channel startup connect grace in health monitor\n- tool-context channel-provider fallback for message actions\n\nCo-authored-by: Munem Hashmi <[email protected]>

* docs(changelog): add landed notes for #32336 and #32364

* fix(test): tighten tool result typing in context pruning tests

* fix(hooks): propagate run/tool IDs for tool hook correlation (#32360)

* Plugin SDK: add run and tool call fields to tool hooks

* Agents: propagate runId and toolCallId in before_tool_call

* Agents: thread runId through tool wrapper context

* Runner: pass runId into tool hook context

* Compaction: pass runId into tool hook context

* Agents: scope after_tool_call start data by run

* Tests: cover run and tool IDs in before_tool_call hooks

* Tests: add run-scoped after_tool_call collision coverage

* Hooks: scope adjusted tool params by run

* Tests: cover run-scoped adjusted param collisions

* Hooks: preserve active tool start metadata until end

* Changelog: add tool-hook correlation note

* test: fix tsgo baseline test compatibility

* feat(plugin-sdk): Add channelRuntime support for external channel plugins

## Overview

This PR enables external channel plugins (loaded via Plugin SDK) to access
advanced runtime features like AI response dispatching, which were previously
only available to built-in channels.

## Changes

### src/gateway/server-channels.ts
- Import PluginRuntime type
- Add optional channelRuntime parameter to ChannelManagerOptions
- Pass channelRuntime to channel startAccount calls via conditional spread
- Ensures backward compatibility (field is optional)

### src/gateway/server.impl.ts
- Import createPluginRuntime from plugins/runtime
- Create and pass channelRuntime to channel manager

### src/channels/plugins/types.adapters.ts
- Import PluginRuntime type
- Add comprehensive documentation for channelRuntime field
- Document available features, use cases, and examples
- Improve type safety (use imported PluginRuntime type vs inline import)

## Benefits

External channel plugins can now:
- Generate AI-powered responses using dispatchReplyWithBufferedBlockDispatcher
- Access routing, text processing, and session management utilities
- Use command authorization and group policy resolution
- Maintain feature parity with built-in channels

## Backward Compatibility

- channelRuntime field is optional in ChannelGatewayContext
- Conditional spread ensures it's only passed when explicitly provided
- Existing channels without channelRuntime support continue to work unchanged
- No breaking changes to channel plugin API

## Testing

- Email channel plugin successfully uses channelRuntime for AI responses
- All existing built-in channels (slack, discord, telegram, etc.) work unchanged
- Gateway loads and runs without errors when channelRuntime is provided

* fix: add channelRuntime regression coverage (#25462) (thanks @guxiaobo)

* Feishu: cache failing probes (#29970)

* Feishu: cache failing probes

* Changelog: add Feishu probe failure backoff note

---------

Co-authored-by: bmendonca3 <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* refactor(tests): dedupe agent lock and loop detection fixtures

* refactor(tests): dedupe browser and telegram tool test fixtures

* refactor(tests): dedupe web fetch and embedded tool hook fixtures

* refactor(tests): dedupe cron delivered status assertions

* refactor(outbound): unify channel selection and action input normalization

* refactor(gateway): extract channel health policy and timing aliases

* refactor(feishu): expose default-account selection source

* plugin-sdk: expose onAgentEvent + onSessionTranscriptUpdate via PluginRuntime.events

* fix: wrap transcript event listeners in try/catch to prevent throw propagation

* style: fix indentation in transcript-events

* fix: add missing events property to bluebubbles PluginRuntime mock

* docs: add JSDoc to onSessionTranscriptUpdate

* docs: expand JSDoc for onSessionTranscriptUpdate params and return

* Fix styles

* fix: add runtime.events regression tests (#16044) (thanks @scifantastic)

* feat(hooks): add trigger and channelId to plugin hook agent context (#28623)

* feat(hooks): add trigger and channelId to plugin hook agent context

Adds `trigger` and `channelId` fields to `PluginHookAgentContext` so
plugins can determine what initiated the agent run and which channel
it originated from, without session-key parsing or Redis bridging.

trigger values: "user", "heartbeat", "cron", "memory"
channelId values: "telegram", "discord", "whatsapp", etc.

Both fields are threaded through run.ts and attempt.ts hookCtx so all
hook phases receive them (before_model_resolve, before_prompt_build,
before_agent_start, llm_input, llm_output, agent_end).

channelId falls back from messageChannel to messageProvider when the
former is not set. followup-runner passes originatingChannel so queued
followup runs also carry channel context.

* docs(changelog): note hook context parity fix for #28623

---------

Co-authored-by: Vincent Koc <[email protected]>

* feat(plugins): expose requestHeartbeatNow on plugin runtime

Add requestHeartbeatNow to PluginRuntime.system so extensions can
trigger an immediate heartbeat wake without importing internal modules.

This enables extensions to inject a system event and wake the agent
in one step — useful for inbound message handlers that use the
heartbeat model (e.g. agent-to-agent DMs via Nostr).

Changes:
- src/plugins/runtime/types.ts: add RequestHeartbeatNow type alias
  and requestHeartbeatNow to PluginRuntime.system
- src/plugins/runtime/index.ts: import and wire requestHeartbeatNow
  into createPluginRuntime()

* fix: add requestHeartbeatNow to bluebubbles test mock

* fix: add requestHeartbeatNow runtime coverage (#19464) (thanks @AustinEral)

* fix: force supportsDeveloperRole=false for non-native OpenAI endpoints (#29479)

Merged via squash.

Prepared head SHA: 1416c584ac4cdc48af9f224e3d870ef40900c752
Co-authored-by: akramcodez <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras

* test(agents): tighten pi message typing and dedupe malformed tool-call cases

* refactor(test): extract cron issue-regression harness and frozen-time helper

* fix(test): resolve upstream typing drift in feishu and cron suites

* fix(security): pin tlon api source and secure hold music url

* Plugins: add sessionKey to session lifecycle hooks

* fix: add session hook context regression tests (#26394) (thanks @tempeste)

* refactor(tests): dedupe isolated agent cron turn assertions

* refactor(tests): dedupe cron store migration setup

* refactor(tests): dedupe discord monitor e2e fixtures

* refactor(tests): dedupe manifest registry link fixture setup

* refactor(tests): dedupe security fix scenario helpers

* refactor(tests): dedupe media transcript echo config setup

* refactor(tests): dedupe tools invoke http request helpers

* feat(memory): add Ollama embedding provider (#26349)

Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: ac413865431614c352c3b29f2dfccc5593f0605a
Co-authored-by: nico-hoff <[email protected]>
Co-authored-by: gumadeiras <[email protected]>
Reviewed-by: @gumadeiras

* fix(sessions): preserve idle reset timestamp on inbound metadata

* fix: preserve idle reset timestamp on inbound metadata writes (#32379) (thanks @romeodiaz)

* fix(slack): fail fast on non-recoverable auth errors instead of retry loop

When a Slack bot is removed from a workspace while still configured in
OpenClaw, the gateway enters an infinite retry loop on account_inactive
or invalid_auth errors, making the entire gateway unresponsive.

Add isNonRecoverableSlackAuthError() to detect permanent credential
failures (account_inactive, invalid_auth, token_revoked, etc.) and
throw immediately instead of retrying.  This mirrors how the Telegram
provider already distinguishes recoverable network errors from fatal
auth errors via isRecoverableTelegramNetworkError().

The check is applied in both the startup catch block and the disconnect
reconnect path so stale credentials always fail fast with a clear error
message.

Closes #32366

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix: fail fast on non-recoverable slack auth errors (#32377) (thanks @scoootscooob)

* fix(cli): fail fast on unsupported Node versions in install and runtime paths

Surface a clear Node 22.12+ requirement before npm/install bootstrap work so users avoid misleading downstream errors.

- Add installer shell preflight to block active Node <22 and suggest NVM recovery commands
- Add openclaw.mjs runtime preflight for npm/npx usage with explicit Node version guidance
- Keep messaging actionable for both NVM and non-NVM environments

* fix(cli): align Node 22.12 preflight checks and clean runtime guard output

Tighten installer/runtime consistency so users on Node 22.0-22.11 are blocked before install/runtime drift, with cleaner CLI guidance.

- Enforce Node >=22.12 in scripts/install.sh preflight checks
- Align installer messages to the same 22.12+ runtime floor
- Replace openclaw.mjs thrown version error with stderr+exit to avoid noisy stack traces

* fix: enforce node v22.12+ preflight for installer and runtime (#32356) (thanks @jasonhargrove)

* fix(webui): prevent inline code from breaking mid-token on copy/paste

The parent `.chat-text` applies `overflow-wrap: anywhere; word-break: break-word;`
which forces long tokens (UUIDs, hashes) inside inline `<code>` to break across
visual lines. When copied, the browser injects spaces at those break points,
corrupting the pasted value.

Override with `overflow-wrap: normal; word-break: keep-all;` on inline `<code>`
selectors so tokens stay intact.

Fixes #32230

Signed-off-by: HCL <[email protected]>

* fix: preserve inline code copy fidelity in web ui (#32346) (thanks @hclsys)

* refactor: modularize plugin runtime and test hooks

* fix(agents): treat host edit tool as success when file contains newText after upstream throw (fixes #32333)

* fix(agents): only recover edit when oldText no longer in file (review feedback)

* fix: recover host edit success after post-write upstream throw (#32383) (thanks @polooooo)

* CLI: unify routed config positional parsing

* test(agents): centralize AgentMessage fixtures and remove unsafe casts

* test(security): reduce audit fixture setup overhead

* test(telegram): dedupe streaming cases and tighten sequential key checks

* refactor(agents): split pi-tools param and host-edit wrappers

* refactor(sessions): add explicit merge activity policies

* refactor(slack): extract socket reconnect policy helpers

* refactor(runtime): unify node version guard parsing

* refactor(ui): dedupe inline code wrap rules

* fix: resolve pi-tools typing regressions

* refactor(telegram): extract sequential key module

* test(telegram): move preview-finalization cases to lane unit tests

* perf(agents): cache per-pass context char estimates

* perf(security): allow audit snapshot and summary cache reuse

* fix(docker): correct awk quoting in Docker GPG fingerprint check (#32153)

* fix(acp): use publishable acpx install hint

* Agents: add context metadata warmup retry backoff

* fix(exec): suggest increasing timeout on timeouts

* fix(gemini-cli-auth): use PLATFORM_UNSPECIFIED for Linux in loadCodeAssist

Google's loadCodeAssist API rejects "LINUX" as an invalid Platform enum
value, causing OAuth setup to fail with 400 Bad Request on Linux systems.

The pi-ai runtime already uses "PLATFORM_UNSPECIFIED" for this field.
This aligns the extension's discoverProject() with that approach by
returning "PLATFORM_UNSPECIFIED" for Linux (and other non-Windows/macOS
platforms) instead of "LINUX".

Also fixes the original resolvePlatform() which incorrectly fell through
to "MACOS" as default instead of explicitly checking for "darwin".

* chore: remove unreachable "LINUX" from resolvePlatform return type

Address review feedback: since resolvePlatform() no longer returns
"LINUX", remove it from the union type to prevent future confusion.

* fix(agents): recognize connection errors as retryable timeout failures (#31697)

* fix(agents): recognize connection errors as retryable timeout failures

## Problem

When a model endpoint becomes unreachable (e.g., local proxy down,
relay server offline), the failover system fails to switch to the
next candidate model. Errors like "Connection error." are not
classified as retryable, causing the session to hang on a broken
endpoint instead of falling back to healthy alternatives.

## Root Cause

Connection/network errors are not recognized by the current failover
classifier:
- Text patterns like "Connection error.", "fetch failed", "network error"
- Error codes like ECONNREFUSED, ENOTFOUND, EAI_AGAIN (in message text)

While `failover-error.ts` handles these as error codes (err.code),
it misses them when they appear as plain text in error messages.

## Solution

Extend timeout error patterns to include connection/network failures:

**In `errors.ts` (ERROR_PATTERNS.timeout):**
- Text: "connection error", "network error", "fetch failed", etc.
- Regex: /\beconn(?:refused|reset|aborted)\b/i, /\benotfound\b/i, /\beai_again\b/i

**In `failover-error.ts` (TIMEOUT_HINT_RE):**
- Same patterns for non-assistant error paths

## Testing

Added test cases covering:
- "Connection error."
- "fetch failed"
- "network error: ECONNREFUSED"
- "ENOTFOUND" / "EAI_AGAIN" in message text

## Impact

- **Compatibility:** High - only expands retryable error detection
- **Behavior:** Connection failures now trigger automatic fallback
- **Risk:** Low - changes are additive and well-tested

* style: fix code formatting for test file

* refactor: split plugin runtime type contracts

* refactor: extract session init helpers

* ci: move changed-scope logic into tested script

* test: consolidate extension runtime mocks and split bluebubbles webhook auth suite

* fix(voice-call): prevent EADDRINUSE by guarding webhook server lifecycle

Three issues caused the port to remain bound after partial failures:

1. VoiceCallWebhookServer.start() had no idempotency guard — calling it
   while the server was already listening would create a second server on
   the same port.

2. createVoiceCallRuntime() did not clean up the webhook server if a step
   after webhookServer.start() failed (e.g. manager.initialize). The
   server kept the port bound while the runtime promise rejected.

3. ensureRuntime() cached the rejected promise forever, so subsequent
   calls would re-throw the same error without ever retrying. Combined
   with (2), the port stayed orphaned until gateway restart.

Fixes #32387

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(voice-call): harden webhook lifecycle cleanup and retries (#32395) (thanks @scoootscooob)

* fix: unblock build type errors

* ci: scale Windows CI runner and test workers

* refactor(test): dedupe telegram draft-stream fixtures

* refactor(telegram): split lane preview target helpers

* refactor(agents): split tool-result char estimator

* refactor(security): centralize audit execution context

* fix(scripts/pr): SSH-first prhead remote with GraphQL fallback for fork PRs (#32126)

Co-authored-by: Shakker <[email protected]>

* ci: use valid Blacksmith Windows runner label

* CI: add sticky-disk toggle to setup node action

* CI: add sticky-disk mode to pnpm cache action

* CI: enable sticky-disk pnpm cache on Linux CI jobs

* CI: migrate docker release build cache to Blacksmith

* CI: use Blacksmith docker builder in install smoke

* CI: use Blacksmith docker builder in sandbox smoke

* refactor(agents): share failover error matchers

* refactor(cli): extract plugin install plan helper

* refactor(voice-call): unify runtime cleanup lifecycle

* refactor(acp): extract install hint resolver

* refactor(tests): dedupe control ui auth pairing fixtures

* refactor(tests): dedupe gateway chat history fixtures

* refactor(memory): dedupe openai batch fetch flows

* refactor(tests): dedupe model compat assertions

* refactor(acp): dedupe runtime option command plumbing

* refactor(config): dedupe repeated zod schema shapes

* refactor(browser): dedupe playwright interaction helpers

* refactor(tests): dedupe openresponses http fixtures

* refactor(tests): dedupe agent handler test scaffolding

* refactor(tests): dedupe session store route fixtures

* refactor(tests): dedupe canvas host server setup

* refactor(security): dedupe telegram allowlist validation loops

* refactor(config): dedupe session store save error handling

* refactor(gateway): dedupe agents server-method handlers

* refactor(infra): dedupe exec approval allowlist evaluation flow

* refactor(infra): dedupe ssrf fetch guard test fixtures

* refactor(infra): dedupe update startup test setup

* refactor(memory): dedupe readonly recovery test scenarios

* refactor(agents): dedupe ollama provider test scaffolding

* refactor(agents): dedupe steer restart test replacement flow

* refactor(telegram): dedupe monitor retry test helpers

* feat(secrets): expand SecretRef coverage across user-supplied credentials (#29580)

* feat(secrets): expand secret target coverage and gateway tooling

* docs(secrets): align gateway and CLI secret docs

* chore(protocol): regenerate swift gateway models for secrets methods

* fix(config): restore talk apiKey fallback and stabilize runner test

* ci(windows): reduce test worker count for shard stability

* ci(windows): raise node heap for test shard stability

* test(feishu): make proxy env precedence assertion windows-safe

* fix(gateway): resolve auth password SecretInput refs for clients

* fix(gateway): resolve remote SecretInput credentials for clients

* fix(secrets): skip inactive refs in command snapshot assignments

* fix(secrets): scope gateway.remote refs to effective auth surfaces

* fix(secrets): ignore memory defaults when enabled agents disable search

* fix(secrets): honor Google Chat serviceAccountRef inheritance

* fix(secrets): address tsgo errors in command and gateway collectors

* fix(secrets): avoid auth-store load in providers-only configure

* fix(gateway): defer local password ref resolution by precedence

* fix(secrets): gate telegram webhook secret refs by webhook mode

* fix(secrets): gate slack signing secret refs to http mode

* fix(secrets): skip telegram botToken refs when tokenFile is set

* fix(secrets): gate discord pluralkit refs by enabled flag

* fix(secrets): gate discord voice tts refs by voice enabled

* test(secrets): make runtime fixture modes explicit

* fix(cli): resolve local qr password secret refs

* fix(cli): fail when gateway leaves command refs unresolved

* fix(gateway): fail when local password SecretRef is unresolved

* fix(gateway): fail when required remote SecretRefs are unresolved

* fix(gateway): resolve local password refs only when password can win

* fix(cli): skip local password SecretRef resolution on qr token override

* test(gateway): cast SecretRef fixtures to OpenClawConfig

* test(secrets): activate mode-gated targets in runtime coverage fixture

* fix(cron): support SecretInput webhook tokens safely

* fix(bluebubbles): support SecretInput passwords across config paths

* fix(msteams): make appPassword SecretInput-safe in onboarding/token paths

* fix(bluebubbles): align SecretInput schema helper typing

* fix(cli): clarify secrets.resolve version-skew errors

* refactor(secrets): return structured inactive paths from secrets.resolve

* refactor(gateway): type onboarding secret writes as SecretInput

* chore(protocol): regenerate swift models for secrets.resolve

* feat(secrets): expand extension credential secretref support

* fix(secrets): gate web-search refs by active provider

* fix(onboarding): detect SecretRef credentials in extension status

* fix(onboarding): allow keeping existing ref in secret prompt

* fix(onboarding): resolve gateway password SecretRefs for probe and tui

* fix(onboarding): honor secret-input-mode for local gateway auth

* fix(acp): resolve gateway SecretInput credentials

* fix(secrets): gate gateway.remote refs to remote surfaces

* test(secrets): cover pattern matching and inactive array refs

* docs(secrets): clarify secrets.resolve and remote active surfaces

* fix(bluebubbles): keep existing SecretRef during onboarding

* fix(tests): resolve CI type errors in new SecretRef coverage

* fix(extensions): replace raw fetch with SSRF-guarded fetch

* test(secrets): mark gateway remote targets active in runtime coverage

* test(infra): normalize home-prefix expectation across platforms

* fix(cli): only resolve local qr password refs in password mode

* test(cli): cover local qr token mode with unresolved password ref

* docs(cli): clarify local qr password ref resolution behavior

* refactor(extensions): reuse sdk SecretInput helpers

* fix(wizard): resolve onboarding env-template secrets before plaintext

* fix(cli): surface secrets.resolve diagnostics in memory and qr

* test(secrets): repair post-rebase runtime and fixtures

* fix(gateway): skip remote password ref resolution when token wins

* fix(secrets): treat tailscale remote gateway refs as active

* fix(gateway): allow remote password fallback when token ref is unresolved

* fix(gateway): ignore stale local password refs for none and trusted-proxy

* fix(gateway): skip remote secret ref resolution on local call paths

* test(cli): cover qr remote tailscale secret ref resolution

* fix(secrets): align gateway password active-surface with auth inference

* fix(cli): resolve inferred local gateway password refs in qr

* fix(gateway): prefer resolvable remote password over token ref pre-resolution

* test(gateway): cover none and trusted-proxy stale password refs

* docs(secrets): sync qr and gateway active-surface behavior

* fix: restore stability blockers from pre-release audit

* Secrets: fix collector/runtime precedence contradictions

* docs: align secrets and web credential docs

* fix(rebase): resolve integration regressions after main rebase

* fix(node-host): resolve gateway secret refs for auth

* fix(secrets): harden secretinput runtime readers

* gateway: skip inactive auth secretref resolution

* cli: avoid gateway preflight for inactive secret refs

* extensions: allow unresolved refs in onboarding status

* tests: fix qr-cli module mock hoist ordering

* Security: align audit checks with SecretInput resolution

* Gateway: resolve local-mode remote fallback secret refs

* Node host: avoid resolving inactive password secret refs

* Secrets runtime: mark Slack appToken inactive for HTTP mode

* secrets: keep inactive gateway remote refs non-blocking

* cli: include agent memory secret targets in runtime resolution

* docs(secrets): sync docs with active-surface and web search behavior

* fix(secrets): keep telegram top-level token refs active for blank account tokens

* fix(daemon): resolve gateway password secret refs for probe auth

* fix(secrets): skip IRC NickServ ref resolution when NickServ is disabled

* fix(secrets): align token inheritance and exec timeout defaults

* docs(secrets): clarify active-surface notes in cli docs

* cli: require secrets.resolve gateway capability

* gateway: log auth secret surface diagnostics

* secrets: remove dead provider resolver module

* fix(secrets): restore gateway auth precedence and fallback resolution

* fix(tests): align plugin runtime mock typings

---------

Co-authored-by: Peter Steinberger <[email protected]>

* CI: optimize Windows lane by splitting bundle and dropping duplicate lanes

* docs: reorder unreleased changelog by user interest

* fix(e2e): include shared tool display resource in onboard docker build

* fix(ci): tighten type signatures in gateway params validation

* fix(telegram): move unchanged command-sync log to verbose

* test: fix strict runtime mock types in channel tests

* test: load ci changed-scope script via esm import

* fix(swift): align async helper callsites across iOS and macOS

* refactor(macos): simplify pairing alert and host helper paths

* style(swift): apply lint and format cleanup

* CI: gate Windows checks by windows-relevant scope (#32456)

* CI: add windows scope output for changed-scope

* Test: cover windows scope gating in changed-scope

* CI: gate checks-windows by windows scope

* Docs: update CI windows scope and runner label

* CI: move checks-windows to 32 vCPU runner

* Docs: align CI windows runner with workflow

* CI: allow blacksmith 32 vCPU Windows runner in actionlint

* chore(gitignore): ignore android kotlin cache

* fix(ci): restore scope-test require import and sync host policy

* docs(changelog): add SecretRef note for #29580

* fix(venice): retry model discovery on transient fetch failures

* fix(feishu): preserve block streaming text when final payload is missing (#30663)

* fix(feishu): preserve block streaming text when final payload is missing

When Feishu card streaming receives block payloads without matching final/partial
callbacks, keep block text in stream state so onIdle close still publishes the
reply instead of an empty message. Add a regression test for block-only streaming.

Closes #30628

* Feishu: preserve streaming block fallback when final text is missing

---------

Co-authored-by: Tak Hoffman <[email protected]>

* CI: add exact-key mode for pnpm cache restore

* CI: reduce pre-test Windows setup latency

* CI: increase checks-windows test shards to 3

* CI: increase checks-windows test shards to 4

* docs(feishu): Feishu docs – add verificationToken and align zh-CN with EN (openclaw#31555) thanks @xbsheng

Verified:
- pnpm build
- pnpm test:macmini
- pnpm check (blocked locally by pre-existing mainline lint issue in src/scripts/ci-changed-scope.test.ts unrelated to this PR)

Co-authored-by: xbsheng <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* fix(ci): avoid shell interpolation in changed-scope git diff

* test(ci): add changed-scope shell-injection regression

* fix(gateway): retry exec-read live tool probe

* feat(feishu): add broadcast support for multi-agent groups (#29575)

* feat(feishu): add broadcast support for multi-agent group observation

When multiple agents share a Feishu group chat, only the @mentioned
agent receives the message. This prevents observer agents from building
session memory of group activity they weren't directly addressed in.

Adds broadcast support (reusing the same cfg.broadcast schema as
WhatsApp) so all configured agents receive every group message in their
session transcripts. Only the @mentioned agent responds on Feishu;
observer agents process silently via no-op dispatchers.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): guard sequential broadcast dispatch against single-agent failure

Wrap each dispatchForAgent() call in the sequential loop with try/catch
so one agent's dispatch failure doesn't abort delivery to remaining agents.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): avoid duplicate messages in broadcast observer mode and normalize agent IDs

- Skip recordPendingHistoryEntryIfEnabled for broadcast groups when not
  mentioned, since the message is dispatched directly to all agents.
  Previously the message appeared twice in the agent prompt.
- Normalize agent IDs with toLowerCase() before membership checks so
  config casing mismatches don't silently skip valid agents.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): set WasMentioned per-agent and normalize broadcast IDs

- buildCtxPayloadForAgent now takes a wasMentioned parameter so active
  agents get WasMentioned=true and observers get false (P1 fix)
- Normalize broadcastAgents to lowercase at resolution time and
  lowercase activeAgentId so all comparisons and session key generation
  use canonical IDs regardless of config casing (P2 fix)

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): canonicalize broadcast agent IDs with normalizeAgentId

* fix(feishu): match ReplyDispatcher sync return types for noop dispatcher

The upstream ReplyDispatcher changed sendToolResult/sendBlockReply/
sendFinalReply to synchronous (returning boolean). Update the broadcast
observer noop dispatcher to match.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): deduplicate broadcast agent IDs after normalization

Config entries like "Main" and "main" collapse to the same canonical ID
after normalizeAgentId but were dispatched multiple times. Use Set to
deduplicate after normalization.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): honor requireMention=false when selecting broadcast responder

When requireMention is false, the routed agent should be active (reply
on Feishu) even without an explicit @mention. Previously activeAgentId
was null whenever ctx.mentionedBot was false, so all agents got the
noop dispatcher and no reply was sent — silently breaking groups that
disabled mention gating.

Hoist requireMention out of the if(isGroup) block so it's accessible
in the dispatch code.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): cross-account broadcast dedup to prevent duplicate dispatches

In multi-account Feishu setups, the same message event is delivered to
every bot account in a group. Without cross-account dedup, each account
independently dispatches broadcast agents, causing 2×N dispatches instead
of N (where N = number of broadcast agents).

Two changes:
1. requireMention=true + bot not mentioned: return early instead of
   falling through to broadcast. The mentioned bot's handler will
   dispatch for all agents. Non-mentioned handlers record to history.
2. Add cross-account broadcast dedup using a shared 'broadcast' namespace
   (tryRecordMessagePersistent). The first handler to reach the broadcast
   block claims the message; subsequent accounts skip. This handles the
   requireMention=false multi-account case.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): strip CommandAuthorized from broadcast observer contexts

Broadcast observer agents inherited CommandAuthorized from the sender,
causing slash commands (e.g. /reset) to silently execute on every observer
session. Now only the active agent retains CommandAuthorized; observers
have it stripped before dispatch.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): use actual mention state for broadcast WasMentioned

The active broadcast agent's WasMentioned was set to true whenever
requireMention=false, even when the bot was not actually @mentioned.
Now uses ctx.mentionedBot && agentId === activeAgentId, consistent
with the single-agent path which passes ctx.mentionedBot directly.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* fix(feishu): skip history buffer for broadcast accounts and log parallel failures

1. In requireMention groups with broadcast, non-mentioned accounts no
   longer buffer pending history — the mentioned handler's broadcast
   dispatch already writes turns into all agent sessions. Buffering
   caused duplicate replay via buildPendingHistoryContextFromMap.

2. Parallel broadcast dispatch now inspects Promise.allSettled results
   and logs rejected entries, matching the sequential path's per-agent
   error logging.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* Changelog: note Feishu multi-agent broadcast dispatch

* Changelog: restore author credit for Feishu broadcast entry

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* chore(release): prepare 2026.3.2-beta.1

* fix(ci): complete feishu route mock typing in broadcast tests

* Delete changelog/fragments directory

* refactor(feishu): unify Lark SDK error handling with LarkApiError (#31450)

* refactor(feishu): unify Lark SDK error handling with LarkApiError

- Add LarkApiError class with code, api, and context fields for better diagnostics
- Add ensureLarkSuccess helper to replace 9 duplicate error check patterns
- Update tool registration layer to return structured error info (code, api, context)

This improves:
- Observability: errors now include API name and request context for easier debugging
- Maintainability: single point of change for error handling logic
- Extensibility: foundation for retry strategies, error classification, etc.

Affected APIs:
- wiki.space.getNode
- bitable.app.get
- bitable.app.create
- bitable.appTableField.list
- bitable.appTableField.create
- bitable.appTableRecord.list
- bitable.appTableRecord.get
- bitable.appTableRecord.create
- bitable.appTableRecord.update

* Changelog: note Feishu bitable error handling unification

---------

Co-authored-by: echoVic <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* fix(gateway): skip google rate limits in live suite

* CI: add toggle to skip pnpm actions cache restore

* CI: shard Windows tests into sixths and skip cache restore

* chore(release): update appcast for 2026.3.2-beta.1

* fix(feishu): correct invalid scope name in permission grant URL (#32509)

* fix(feishu): correct invalid scope name in permission grant URL

The Feishu API returns error code 99991672 with an authorization URL
containing the non-existent scope `contact:contact.base:readonly`
when the `contact.user.get` endpoint is called without the correct
permission. The valid scope is `contact:user.base:readonly`.

Add a scope correction map that replaces known incorrect scope names
in the extracted grant URL before presenting it to the user/agent,
so the authorization link actually works.

Closes #31761

* chore(changelog): note feishu scope correction

---------

Co-authored-by: SidQin-cyber <[email protected]>

* docs: add dedicated pdf tool docs page

* feishu, line: pass per-group systemPrompt to inbound context (#31713)

* feishu: pass per-group systemPrompt to inbound context

The Feishu extension schema supports systemPrompt in per-group config
(channels.feishu.accounts.<id>.groups.<groupId>.systemPrompt) but the
value was never forwarded to the inbound context as GroupSystemPrompt.

This means per-group system prompts configured for Feishu had no effect,
unlike IRC, Discord, Slack, Telegram, Matrix, and other channels that
already pass this field correctly.

Co-authored-by: Copilot <[email protected]>

* line: pass per-group systemPrompt to inbound context

Same issue as feishu: the Line config schema defines systemPrompt in
per-group config but the value was never forwarded as GroupSystemPrompt
in the inbound context payload.

Added resolveLineGroupSystemPrompt helper that mirrors the existing
resolveLineGroupConfig lookup logic (groupId > roomId > wildcard).

Co-authored-by: Copilot <[email protected]>

* Changelog: note Feishu and LINE group systemPrompt propagation

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* docs(changelog): remove docs-only 2026.3.2 entries

* CI: speed up Windows dependency warmup

* CI: make node deps install optional in setup action

* CI: reduce critical path for check build and windows jobs

* fix(feishu): non-blocking WS ACK and preserve full streaming card content (#29616)

* fix(feishu): non-blocking ws ack and preserve streaming card full content

* fix(feishu): preserve fragmented streaming text without newline artifacts

---------

Co-authored-by: Tak Hoffman <[email protected]>

* fix: add session-memory hook support for Feishu provider (#31437)

* fix: add session-memory hook support for Feishu provider

Issue #31275: Session-memory hook not triggered when using /new command in Feishu

- Added command handler to Feishu provider
- Integrated with OpenClaw's before_reset hook system
- Ensures session memory is saved when /new or /reset commands are used

* Changelog: note Feishu session-memory hook parity

---------

Co-authored-by: Tak Hoffman <[email protected]>

* fix(feishu): guard against false-positive @mentions in multi-app groups (#30315)

* fix(feishu): guard against false-positive @mentions in multi-app groups

When multiple Feishu bot apps share a group chat, Feishu's WebSocket
event delivery remaps the open_id in mentions[] per-app. This causes
checkBotMentioned() to return true for ALL bots when only one was
actually @mentioned, making requireMention ineffective.

Add a botName guard: if the mention's open_id matches this bot but the
mention's display name differs from this bot's configured botName, treat
it as a false positive and skip.

botName is already available via account.config.botName (set during
onboarding).

Closes #24249

* fix(feishu): support @all mention in multi-bot groups

When a user sends @all (@_all in Feishu message content), treat it as
mentioning every bot so all agents respond when requireMention is true.

Feishu's @all does not populate the mentions[] array, so this needs
explicit content-level detection.

* fix(feishu): auto-fetch bot display name from API for reliable mention matching

Instead of relying on the manually configured botName (which may differ
from the actual Feishu bot display name), fetch the bot's display name
from the Feishu API at startup via probeFeishu().

This ensures checkBotMentioned() always compares against the correct
display name, even when the config botName doesn't match (e.g. config
says 'Wanda' but Feishu shows '绯红女巫').

Changes:
- monitor.ts: fetchBotOpenId → fetchBotInfo (returns both openId and name)
- monitor.ts: store botNames map, pass botName to handleFeishuMessage
- bot.ts: accept botName from params, prefer it over config fallback

* Changelog: note Feishu multi-app mention false-positive guard

---------

Co-authored-by: Teague Xiao <[email protected]>
Co-authored-by: Tak Hoffman <[email protected]>

* CI: start push test lanes earlier and drop check gating

* CI: disable flaky sticky disk mount for Windows pnpm setup

* chore(release): cut 2026.3.2

* fix(feishu): normalize all mentions in inbound agent context (#30252)

* fix(feishu): normalize all mentions in inbound agent context

Convert Feishu mention placeholders to explicit <at user_id="..."> tags (including bot mentions), add mention semantics hints for the model, and remove unused mentionMessageBody parsing to keep context handling consistent.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* fix(feishu): use replacer callback and escape only < > in normalizeMentions

Switch String.replace to a function replacer to prevent $ sequences in
display names from being interpolated as replacement patterns. Narrow
escaping to < and > only — & does not need escaping in LLM prompt tag
bodies and escaping it degrades readability (e.g. R&D → R&amp;D).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* fix(feishu): only use open_id in normalizeMentions tag, drop user_id fallback

When a mention has no open_id, degrade to @name instead of emitting
<at user_id="uid_...">. This keeps the tag user_id space exclusively
open_id, so the bot self-reference hint (which uses botOpenId) is
always consistent with what appears in the tags.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* fix(feishu): register mention strip