iOS Security Stack 1/5: Keychain Migrations + Tests#33029
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2e90bc3d7d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| public func makeWebSocketTask(url: URL) -> WebSocketTaskBox { | ||
| let task = self.session.webSocketTask(with: url) | ||
| task.maximumMessageSize = 16 * 1024 * 1024 | ||
| task.maximumMessageSize = 4 * 1024 * 1024 |
There was a problem hiding this comment.
Restore 16MB WebSocket receive limit for TLS-pinned sessions
This lowers GatewayTLSPinningSession to a 4MB receive cap, but TLS-enabled connections use this session path and can still receive large snapshot/history frames; the default channel keeps 16MB specifically to avoid "Message too long" failures for those payloads. As a result, secure connections can now fail on responses between 4MB and 16MB that previously worked, creating a regression that only affects TLS/pinned flows.
Useful? React with 👍 / 👎.
| let status = SecItemUpdate(query as CFDictionary, update as CFDictionary) | ||
| if status == errSecSuccess { return true } | ||
| if status != errSecItemNotFound { return false } | ||
| SecItemDelete(query as CFDictionary) |
There was a problem hiding this comment.
Avoid deleting keychain value before replacement succeeds
The new delete-then-add write path can permanently erase an existing secret when SecItemAdd fails (for example, transient keychain availability errors), because the old item is removed first and many call sites ignore the boolean return value. Previously, update-first behavior preserved the existing credential on non-errSecItemNotFound failures, so this change introduces silent credential loss risk for stored gateway/token/password entries.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR successfully migrates iOS gateway connection metadata and TLS fingerprints from UserDefaults to Keychain-backed storage, implements a sound migration path with fallback preservation, and hardens Keychain write semantics through delete-then-add operations. The However, one security gap remains: Confidence Score: 3/5
Last reviewed commit: 2e90bc3 |
Additional Comments (1)
Similarly, To ensure the protection level is consistently maintained, if shouldTruncate {
self.truncateLogIfNeeded(url: url)
self.applyFileProtection(url: url)
}Prompt To Fix With AIThis is a comment left during a code review.
Path: apps/ios/Sources/Gateway/GatewaySettingsStore.swift
Line: 469-511
Comment:
`applyFileProtection` is only invoked in `bootstrap()` at line 483. However, `truncateLogIfNeeded()` uses `tail.write(to: url, options: .atomic)` at line 451, which creates a brand-new file. The new file starts with iOS's default protection level — not `completeUntilFirstUserAuthentication`. Since `log()` can trigger truncation independently (every 50 writes), the protection attribute is silently lost long after `bootstrap()` ran.
Similarly, `appendToLog()` at line 465 uses `.atomic` write when the file doesn't exist, creating an unprotected file if `reset()` deletes the log and `log()` is called before `bootstrap()`.
To ensure the protection level is consistently maintained, `applyFileProtection` should also be called inside `log()` after truncation and after `appendToLog()` creates a new file:
```swift
if shouldTruncate {
self.truncateLogIfNeeded(url: url)
self.applyFileProtection(url: url)
}
```
How can I resolve this? If you propose a fix, please make it concise. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 185c77c665
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| guard GenericPasswordKeychainStore.saveString(existing, service: self.keychainService, account: stableID) else { | ||
| return |
There was a problem hiding this comment.
Preserve legacy TLS pin when Keychain migration fails
When migrateFromUserDefaultsIfNeeded cannot write to Keychain, it exits early and loadFingerprint immediately continues by reading only Keychain, returning nil even though a legacy pin still exists in UserDefaults. That downgrades trust behavior for this session (for example, apps/macos/Sources/OpenClaw/NodeMode/MacNodeModeCoordinator.swift enables TOFU when stored == nil), so a transient Keychain write failure can make a previously pinned gateway behave as unpinned.
Useful? React with 👍 / 👎.
| guard let json = KeychainStore.loadString( | ||
| service: self.gatewayService, account: self.lastGatewayConnectionAccount), | ||
| let data = json.data(using: .utf8), | ||
| let stored = try? JSONDecoder().decode(LastGatewayConnectionData.self, from: data) | ||
| else { return nil } |
There was a problem hiding this comment.
Fall back to legacy last-connection data on migration failure
loadLastGatewayConnection now always decodes from Keychain after calling migration, but migrateLastGatewayFromUserDefaultsIfNeeded leaves legacy defaults in place when saveLastGatewayConnectionData fails. In that failure path this method still returns nil instead of reading the existing gateway.last.* values, which can drop reconnect/onboarding state until a later successful Keychain write.
Useful? React with 👍 / 👎.
92e3ee0 to
8d538be
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8d538be268
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| private static let logSizeCheckEveryWrites = 50 | ||
| nonisolated(unsafe) private static var logWritesSinceCheck = 0 | ||
| private static let logWritesSinceCheck = OSAllocatedUnfairLock(initialState: 0) | ||
| private static let isoFormatter: ISO8601DateFormatter = { |
There was a problem hiding this comment.
Avoid shared ISO8601 formatter in concurrent logging
GatewayDiagnostics.log can be called from multiple threads, but this change introduced a single static ISO8601DateFormatter that is used before work is queued onto the serial logger queue. Because ISO8601DateFormatter is not Sendable/thread-safe, concurrent calls can race while formatting timestamps, which can lead to intermittent crashes or corrupted log timestamps under parallel gateway/talk events.
Useful? React with 👍 / 👎.
| /// On first Keychain read for a given stableID, move any legacy UserDefaults | ||
| /// fingerprint into Keychain and remove the old entry. | ||
| private static func migrateFromUserDefaultsIfNeeded(stableID: String) { | ||
| guard let defaults = UserDefaults(suiteName: self.legacySuiteName) else { return } |
There was a problem hiding this comment.
Preserve standard-defaults fallback in TLS pin migration
Legacy TLS pins were previously read from UserDefaults(suiteName: ...) ?? .standard, but migration now returns immediately when the suite cannot be opened. In environments where that suite is unavailable, existing pins stored in .standard are never migrated or read, so loadFingerprint starts returning nil and previously pinned gateways are treated as unpinned after upgrade.
Useful? React with 👍 / 👎.
0b88928 to
8e98cde
Compare
8e98cde to
b37f3a2
Compare
b37f3a2 to
da2f8f6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da2f8f6141
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| _ = SecItemDelete(query as CFDictionary) | ||
| _ = SecItemAdd(rollback as CFDictionary, nil) |
There was a problem hiding this comment.
Prevent rollback from clobbering concurrent keychain writes
If two saveString calls race on the same (service, account), one SecItemAdd can fail with a duplicate after the other writer has already inserted the new value. This rollback path then deletes the currently stored item and restores previousData, which can revert the key to stale credentials instead of leaving the latest successful write in place. The regression is specific to overlapping writes, but it can silently persist old gateway secrets when contention happens.
Useful? React with 👍 / 👎.
Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: da2f8f6 Co-authored-by: mbelinky <[email protected]> Co-authored-by: mbelinky <[email protected]> Reviewed-by: @mbelinky
* refactor(media): add shared ffmpeg helpers * fix(discord): harden voice ffmpeg path and opus fast-path * fix(ui): ensure GFM tables render in WebChat markdown (#20410) - Pass gfm:true + breaks:true explicitly to marked.parse() so table support is guaranteed even if global setOptions() is bypassed or reset by a future refactor (defense-in-depth) - Add display:block + overflow-x:auto to .chat-text table so wide multi-column tables scroll horizontally instead of being clipped by the parent overflow-x:hidden chat container - Add regression tests for GFM table rendering in markdown.test.ts * fix: webchat gfm table rendering and overflow (#32365) (thanks @BlueBirdBack) * refactor(tests): dedupe pi embedded test harness * refactor(tests): dedupe browser and config cli test setup * fix(test): harden discord lifecycle status sink typing * fix(gateway): harden message action channel fallback and startup grace Take the safe, tested subset from #32367:\n- per-channel startup connect grace in health monitor\n- tool-context channel-provider fallback for message actions\n\nCo-authored-by: Munem Hashmi <[email protected]> * docs(changelog): add landed notes for #32336 and #32364 * fix(test): tighten tool result typing in context pruning tests * fix(hooks): propagate run/tool IDs for tool hook correlation (#32360) * Plugin SDK: add run and tool call fields to tool hooks * Agents: propagate runId and toolCallId in before_tool_call * Agents: thread runId through tool wrapper context * Runner: pass runId into tool hook context * Compaction: pass runId into tool hook context * Agents: scope after_tool_call start data by run * Tests: cover run and tool IDs in before_tool_call hooks * Tests: add run-scoped after_tool_call collision coverage * Hooks: scope adjusted tool params by run * Tests: cover run-scoped adjusted param collisions * Hooks: preserve active tool start metadata until end * Changelog: add tool-hook correlation note * test: fix tsgo baseline test compatibility * feat(plugin-sdk): Add channelRuntime support for external channel plugins ## Overview This PR enables external channel plugins (loaded via Plugin SDK) to access advanced runtime features like AI response dispatching, which were previously only available to built-in channels. ## Changes ### src/gateway/server-channels.ts - Import PluginRuntime type - Add optional channelRuntime parameter to ChannelManagerOptions - Pass channelRuntime to channel startAccount calls via conditional spread - Ensures backward compatibility (field is optional) ### src/gateway/server.impl.ts - Import createPluginRuntime from plugins/runtime - Create and pass channelRuntime to channel manager ### src/channels/plugins/types.adapters.ts - Import PluginRuntime type - Add comprehensive documentation for channelRuntime field - Document available features, use cases, and examples - Improve type safety (use imported PluginRuntime type vs inline import) ## Benefits External channel plugins can now: - Generate AI-powered responses using dispatchReplyWithBufferedBlockDispatcher - Access routing, text processing, and session management utilities - Use command authorization and group policy resolution - Maintain feature parity with built-in channels ## Backward Compatibility - channelRuntime field is optional in ChannelGatewayContext - Conditional spread ensures it's only passed when explicitly provided - Existing channels without channelRuntime support continue to work unchanged - No breaking changes to channel plugin API ## Testing - Email channel plugin successfully uses channelRuntime for AI responses - All existing built-in channels (slack, discord, telegram, etc.) work unchanged - Gateway loads and runs without errors when channelRuntime is provided * fix: add channelRuntime regression coverage (#25462) (thanks @guxiaobo) * Feishu: cache failing probes (#29970) * Feishu: cache failing probes * Changelog: add Feishu probe failure backoff note --------- Co-authored-by: bmendonca3 <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> * refactor(tests): dedupe agent lock and loop detection fixtures * refactor(tests): dedupe browser and telegram tool test fixtures * refactor(tests): dedupe web fetch and embedded tool hook fixtures * refactor(tests): dedupe cron delivered status assertions * refactor(outbound): unify channel selection and action input normalization * refactor(gateway): extract channel health policy and timing aliases * refactor(feishu): expose default-account selection source * plugin-sdk: expose onAgentEvent + onSessionTranscriptUpdate via PluginRuntime.events * fix: wrap transcript event listeners in try/catch to prevent throw propagation * style: fix indentation in transcript-events * fix: add missing events property to bluebubbles PluginRuntime mock * docs: add JSDoc to onSessionTranscriptUpdate * docs: expand JSDoc for onSessionTranscriptUpdate params and return * Fix styles * fix: add runtime.events regression tests (#16044) (thanks @scifantastic) * feat(hooks): add trigger and channelId to plugin hook agent context (#28623) * feat(hooks): add trigger and channelId to plugin hook agent context Adds `trigger` and `channelId` fields to `PluginHookAgentContext` so plugins can determine what initiated the agent run and which channel it originated from, without session-key parsing or Redis bridging. trigger values: "user", "heartbeat", "cron", "memory" channelId values: "telegram", "discord", "whatsapp", etc. Both fields are threaded through run.ts and attempt.ts hookCtx so all hook phases receive them (before_model_resolve, before_prompt_build, before_agent_start, llm_input, llm_output, agent_end). channelId falls back from messageChannel to messageProvider when the former is not set. followup-runner passes originatingChannel so queued followup runs also carry channel context. * docs(changelog): note hook context parity fix for #28623 --------- Co-authored-by: Vincent Koc <[email protected]> * feat(plugins): expose requestHeartbeatNow on plugin runtime Add requestHeartbeatNow to PluginRuntime.system so extensions can trigger an immediate heartbeat wake without importing internal modules. This enables extensions to inject a system event and wake the agent in one step — useful for inbound message handlers that use the heartbeat model (e.g. agent-to-agent DMs via Nostr). Changes: - src/plugins/runtime/types.ts: add RequestHeartbeatNow type alias and requestHeartbeatNow to PluginRuntime.system - src/plugins/runtime/index.ts: import and wire requestHeartbeatNow into createPluginRuntime() * fix: add requestHeartbeatNow to bluebubbles test mock * fix: add requestHeartbeatNow runtime coverage (#19464) (thanks @AustinEral) * fix: force supportsDeveloperRole=false for non-native OpenAI endpoints (#29479) Merged via squash. Prepared head SHA: 1416c584ac4cdc48af9f224e3d870ef40900c752 Co-authored-by: akramcodez <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras * test(agents): tighten pi message typing and dedupe malformed tool-call cases * refactor(test): extract cron issue-regression harness and frozen-time helper * fix(test): resolve upstream typing drift in feishu and cron suites * fix(security): pin tlon api source and secure hold music url * Plugins: add sessionKey to session lifecycle hooks * fix: add session hook context regression tests (#26394) (thanks @tempeste) * refactor(tests): dedupe isolated agent cron turn assertions * refactor(tests): dedupe cron store migration setup * refactor(tests): dedupe discord monitor e2e fixtures * refactor(tests): dedupe manifest registry link fixture setup * refactor(tests): dedupe security fix scenario helpers * refactor(tests): dedupe media transcript echo config setup * refactor(tests): dedupe tools invoke http request helpers * feat(memory): add Ollama embedding provider (#26349) Merged via /review-pr -> /prepare-pr -> /merge-pr. Prepared head SHA: ac413865431614c352c3b29f2dfccc5593f0605a Co-authored-by: nico-hoff <[email protected]> Co-authored-by: gumadeiras <[email protected]> Reviewed-by: @gumadeiras * fix(sessions): preserve idle reset timestamp on inbound metadata * fix: preserve idle reset timestamp on inbound metadata writes (#32379) (thanks @romeodiaz) * fix(slack): fail fast on non-recoverable auth errors instead of retry loop When a Slack bot is removed from a workspace while still configured in OpenClaw, the gateway enters an infinite retry loop on account_inactive or invalid_auth errors, making the entire gateway unresponsive. Add isNonRecoverableSlackAuthError() to detect permanent credential failures (account_inactive, invalid_auth, token_revoked, etc.) and throw immediately instead of retrying. This mirrors how the Telegram provider already distinguishes recoverable network errors from fatal auth errors via isRecoverableTelegramNetworkError(). The check is applied in both the startup catch block and the disconnect reconnect path so stale credentials always fail fast with a clear error message. Closes #32366 Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix: fail fast on non-recoverable slack auth errors (#32377) (thanks @scoootscooob) * fix(cli): fail fast on unsupported Node versions in install and runtime paths Surface a clear Node 22.12+ requirement before npm/install bootstrap work so users avoid misleading downstream errors. - Add installer shell preflight to block active Node <22 and suggest NVM recovery commands - Add openclaw.mjs runtime preflight for npm/npx usage with explicit Node version guidance - Keep messaging actionable for both NVM and non-NVM environments * fix(cli): align Node 22.12 preflight checks and clean runtime guard output Tighten installer/runtime consistency so users on Node 22.0-22.11 are blocked before install/runtime drift, with cleaner CLI guidance. - Enforce Node >=22.12 in scripts/install.sh preflight checks - Align installer messages to the same 22.12+ runtime floor - Replace openclaw.mjs thrown version error with stderr+exit to avoid noisy stack traces * fix: enforce node v22.12+ preflight for installer and runtime (#32356) (thanks @jasonhargrove) * fix(webui): prevent inline code from breaking mid-token on copy/paste The parent `.chat-text` applies `overflow-wrap: anywhere; word-break: break-word;` which forces long tokens (UUIDs, hashes) inside inline `<code>` to break across visual lines. When copied, the browser injects spaces at those break points, corrupting the pasted value. Override with `overflow-wrap: normal; word-break: keep-all;` on inline `<code>` selectors so tokens stay intact. Fixes #32230 Signed-off-by: HCL <[email protected]> * fix: preserve inline code copy fidelity in web ui (#32346) (thanks @hclsys) * refactor: modularize plugin runtime and test hooks * fix(agents): treat host edit tool as success when file contains newText after upstream throw (fixes #32333) * fix(agents): only recover edit when oldText no longer in file (review feedback) * fix: recover host edit success after post-write upstream throw (#32383) (thanks @polooooo) * CLI: unify routed config positional parsing * test(agents): centralize AgentMessage fixtures and remove unsafe casts * test(security): reduce audit fixture setup overhead * test(telegram): dedupe streaming cases and tighten sequential key checks * refactor(agents): split pi-tools param and host-edit wrappers * refactor(sessions): add explicit merge activity policies * refactor(slack): extract socket reconnect policy helpers * refactor(runtime): unify node version guard parsing * refactor(ui): dedupe inline code wrap rules * fix: resolve pi-tools typing regressions * refactor(telegram): extract sequential key module * test(telegram): move preview-finalization cases to lane unit tests * perf(agents): cache per-pass context char estimates * perf(security): allow audit snapshot and summary cache reuse * fix(docker): correct awk quoting in Docker GPG fingerprint check (#32153) * fix(acp): use publishable acpx install hint * Agents: add context metadata warmup retry backoff * fix(exec): suggest increasing timeout on timeouts * fix(gemini-cli-auth): use PLATFORM_UNSPECIFIED for Linux in loadCodeAssist Google's loadCodeAssist API rejects "LINUX" as an invalid Platform enum value, causing OAuth setup to fail with 400 Bad Request on Linux systems. The pi-ai runtime already uses "PLATFORM_UNSPECIFIED" for this field. This aligns the extension's discoverProject() with that approach by returning "PLATFORM_UNSPECIFIED" for Linux (and other non-Windows/macOS platforms) instead of "LINUX". Also fixes the original resolvePlatform() which incorrectly fell through to "MACOS" as default instead of explicitly checking for "darwin". * chore: remove unreachable "LINUX" from resolvePlatform return type Address review feedback: since resolvePlatform() no longer returns "LINUX", remove it from the union type to prevent future confusion. * fix(agents): recognize connection errors as retryable timeout failures (#31697) * fix(agents): recognize connection errors as retryable timeout failures ## Problem When a model endpoint becomes unreachable (e.g., local proxy down, relay server offline), the failover system fails to switch to the next candidate model. Errors like "Connection error." are not classified as retryable, causing the session to hang on a broken endpoint instead of falling back to healthy alternatives. ## Root Cause Connection/network errors are not recognized by the current failover classifier: - Text patterns like "Connection error.", "fetch failed", "network error" - Error codes like ECONNREFUSED, ENOTFOUND, EAI_AGAIN (in message text) While `failover-error.ts` handles these as error codes (err.code), it misses them when they appear as plain text in error messages. ## Solution Extend timeout error patterns to include connection/network failures: **In `errors.ts` (ERROR_PATTERNS.timeout):** - Text: "connection error", "network error", "fetch failed", etc. - Regex: /\beconn(?:refused|reset|aborted)\b/i, /\benotfound\b/i, /\beai_again\b/i **In `failover-error.ts` (TIMEOUT_HINT_RE):** - Same patterns for non-assistant error paths ## Testing Added test cases covering: - "Connection error." - "fetch failed" - "network error: ECONNREFUSED" - "ENOTFOUND" / "EAI_AGAIN" in message text ## Impact - **Compatibility:** High - only expands retryable error detection - **Behavior:** Connection failures now trigger automatic fallback - **Risk:** Low - changes are additive and well-tested * style: fix code formatting for test file * refactor: split plugin runtime type contracts * refactor: extract session init helpers * ci: move changed-scope logic into tested script * test: consolidate extension runtime mocks and split bluebubbles webhook auth suite * fix(voice-call): prevent EADDRINUSE by guarding webhook server lifecycle Three issues caused the port to remain bound after partial failures: 1. VoiceCallWebhookServer.start() had no idempotency guard — calling it while the server was already listening would create a second server on the same port. 2. createVoiceCallRuntime() did not clean up the webhook server if a step after webhookServer.start() failed (e.g. manager.initialize). The server kept the port bound while the runtime promise rejected. 3. ensureRuntime() cached the rejected promise forever, so subsequent calls would re-throw the same error without ever retrying. Combined with (2), the port stayed orphaned until gateway restart. Fixes #32387 Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(voice-call): harden webhook lifecycle cleanup and retries (#32395) (thanks @scoootscooob) * fix: unblock build type errors * ci: scale Windows CI runner and test workers * refactor(test): dedupe telegram draft-stream fixtures * refactor(telegram): split lane preview target helpers * refactor(agents): split tool-result char estimator * refactor(security): centralize audit execution context * fix(scripts/pr): SSH-first prhead remote with GraphQL fallback for fork PRs (#32126) Co-authored-by: Shakker <[email protected]> * ci: use valid Blacksmith Windows runner label * CI: add sticky-disk toggle to setup node action * CI: add sticky-disk mode to pnpm cache action * CI: enable sticky-disk pnpm cache on Linux CI jobs * CI: migrate docker release build cache to Blacksmith * CI: use Blacksmith docker builder in install smoke * CI: use Blacksmith docker builder in sandbox smoke * refactor(agents): share failover error matchers * refactor(cli): extract plugin install plan helper * refactor(voice-call): unify runtime cleanup lifecycle * refactor(acp): extract install hint resolver * refactor(tests): dedupe control ui auth pairing fixtures * refactor(tests): dedupe gateway chat history fixtures * refactor(memory): dedupe openai batch fetch flows * refactor(tests): dedupe model compat assertions * refactor(acp): dedupe runtime option command plumbing * refactor(config): dedupe repeated zod schema shapes * refactor(browser): dedupe playwright interaction helpers * refactor(tests): dedupe openresponses http fixtures * refactor(tests): dedupe agent handler test scaffolding * refactor(tests): dedupe session store route fixtures * refactor(tests): dedupe canvas host server setup * refactor(security): dedupe telegram allowlist validation loops * refactor(config): dedupe session store save error handling * refactor(gateway): dedupe agents server-method handlers * refactor(infra): dedupe exec approval allowlist evaluation flow * refactor(infra): dedupe ssrf fetch guard test fixtures * refactor(infra): dedupe update startup test setup * refactor(memory): dedupe readonly recovery test scenarios * refactor(agents): dedupe ollama provider test scaffolding * refactor(agents): dedupe steer restart test replacement flow * refactor(telegram): dedupe monitor retry test helpers * feat(secrets): expand SecretRef coverage across user-supplied credentials (#29580) * feat(secrets): expand secret target coverage and gateway tooling * docs(secrets): align gateway and CLI secret docs * chore(protocol): regenerate swift gateway models for secrets methods * fix(config): restore talk apiKey fallback and stabilize runner test * ci(windows): reduce test worker count for shard stability * ci(windows): raise node heap for test shard stability * test(feishu): make proxy env precedence assertion windows-safe * fix(gateway): resolve auth password SecretInput refs for clients * fix(gateway): resolve remote SecretInput credentials for clients * fix(secrets): skip inactive refs in command snapshot assignments * fix(secrets): scope gateway.remote refs to effective auth surfaces * fix(secrets): ignore memory defaults when enabled agents disable search * fix(secrets): honor Google Chat serviceAccountRef inheritance * fix(secrets): address tsgo errors in command and gateway collectors * fix(secrets): avoid auth-store load in providers-only configure * fix(gateway): defer local password ref resolution by precedence * fix(secrets): gate telegram webhook secret refs by webhook mode * fix(secrets): gate slack signing secret refs to http mode * fix(secrets): skip telegram botToken refs when tokenFile is set * fix(secrets): gate discord pluralkit refs by enabled flag * fix(secrets): gate discord voice tts refs by voice enabled * test(secrets): make runtime fixture modes explicit * fix(cli): resolve local qr password secret refs * fix(cli): fail when gateway leaves command refs unresolved * fix(gateway): fail when local password SecretRef is unresolved * fix(gateway): fail when required remote SecretRefs are unresolved * fix(gateway): resolve local password refs only when password can win * fix(cli): skip local password SecretRef resolution on qr token override * test(gateway): cast SecretRef fixtures to OpenClawConfig * test(secrets): activate mode-gated targets in runtime coverage fixture * fix(cron): support SecretInput webhook tokens safely * fix(bluebubbles): support SecretInput passwords across config paths * fix(msteams): make appPassword SecretInput-safe in onboarding/token paths * fix(bluebubbles): align SecretInput schema helper typing * fix(cli): clarify secrets.resolve version-skew errors * refactor(secrets): return structured inactive paths from secrets.resolve * refactor(gateway): type onboarding secret writes as SecretInput * chore(protocol): regenerate swift models for secrets.resolve * feat(secrets): expand extension credential secretref support * fix(secrets): gate web-search refs by active provider * fix(onboarding): detect SecretRef credentials in extension status * fix(onboarding): allow keeping existing ref in secret prompt * fix(onboarding): resolve gateway password SecretRefs for probe and tui * fix(onboarding): honor secret-input-mode for local gateway auth * fix(acp): resolve gateway SecretInput credentials * fix(secrets): gate gateway.remote refs to remote surfaces * test(secrets): cover pattern matching and inactive array refs * docs(secrets): clarify secrets.resolve and remote active surfaces * fix(bluebubbles): keep existing SecretRef during onboarding * fix(tests): resolve CI type errors in new SecretRef coverage * fix(extensions): replace raw fetch with SSRF-guarded fetch * test(secrets): mark gateway remote targets active in runtime coverage * test(infra): normalize home-prefix expectation across platforms * fix(cli): only resolve local qr password refs in password mode * test(cli): cover local qr token mode with unresolved password ref * docs(cli): clarify local qr password ref resolution behavior * refactor(extensions): reuse sdk SecretInput helpers * fix(wizard): resolve onboarding env-template secrets before plaintext * fix(cli): surface secrets.resolve diagnostics in memory and qr * test(secrets): repair post-rebase runtime and fixtures * fix(gateway): skip remote password ref resolution when token wins * fix(secrets): treat tailscale remote gateway refs as active * fix(gateway): allow remote password fallback when token ref is unresolved * fix(gateway): ignore stale local password refs for none and trusted-proxy * fix(gateway): skip remote secret ref resolution on local call paths * test(cli): cover qr remote tailscale secret ref resolution * fix(secrets): align gateway password active-surface with auth inference * fix(cli): resolve inferred local gateway password refs in qr * fix(gateway): prefer resolvable remote password over token ref pre-resolution * test(gateway): cover none and trusted-proxy stale password refs * docs(secrets): sync qr and gateway active-surface behavior * fix: restore stability blockers from pre-release audit * Secrets: fix collector/runtime precedence contradictions * docs: align secrets and web credential docs * fix(rebase): resolve integration regressions after main rebase * fix(node-host): resolve gateway secret refs for auth * fix(secrets): harden secretinput runtime readers * gateway: skip inactive auth secretref resolution * cli: avoid gateway preflight for inactive secret refs * extensions: allow unresolved refs in onboarding status * tests: fix qr-cli module mock hoist ordering * Security: align audit checks with SecretInput resolution * Gateway: resolve local-mode remote fallback secret refs * Node host: avoid resolving inactive password secret refs * Secrets runtime: mark Slack appToken inactive for HTTP mode * secrets: keep inactive gateway remote refs non-blocking * cli: include agent memory secret targets in runtime resolution * docs(secrets): sync docs with active-surface and web search behavior * fix(secrets): keep telegram top-level token refs active for blank account tokens * fix(daemon): resolve gateway password secret refs for probe auth * fix(secrets): skip IRC NickServ ref resolution when NickServ is disabled * fix(secrets): align token inheritance and exec timeout defaults * docs(secrets): clarify active-surface notes in cli docs * cli: require secrets.resolve gateway capability * gateway: log auth secret surface diagnostics * secrets: remove dead provider resolver module * fix(secrets): restore gateway auth precedence and fallback resolution * fix(tests): align plugin runtime mock typings --------- Co-authored-by: Peter Steinberger <[email protected]> * CI: optimize Windows lane by splitting bundle and dropping duplicate lanes * docs: reorder unreleased changelog by user interest * fix(e2e): include shared tool display resource in onboard docker build * fix(ci): tighten type signatures in gateway params validation * fix(telegram): move unchanged command-sync log to verbose * test: fix strict runtime mock types in channel tests * test: load ci changed-scope script via esm import * fix(swift): align async helper callsites across iOS and macOS * refactor(macos): simplify pairing alert and host helper paths * style(swift): apply lint and format cleanup * CI: gate Windows checks by windows-relevant scope (#32456) * CI: add windows scope output for changed-scope * Test: cover windows scope gating in changed-scope * CI: gate checks-windows by windows scope * Docs: update CI windows scope and runner label * CI: move checks-windows to 32 vCPU runner * Docs: align CI windows runner with workflow * CI: allow blacksmith 32 vCPU Windows runner in actionlint * chore(gitignore): ignore android kotlin cache * fix(ci): restore scope-test require import and sync host policy * docs(changelog): add SecretRef note for #29580 * fix(venice): retry model discovery on transient fetch failures * fix(feishu): preserve block streaming text when final payload is missing (#30663) * fix(feishu): preserve block streaming text when final payload is missing When Feishu card streaming receives block payloads without matching final/partial callbacks, keep block text in stream state so onIdle close still publishes the reply instead of an empty message. Add a regression test for block-only streaming. Closes #30628 * Feishu: preserve streaming block fallback when final text is missing --------- Co-authored-by: Tak Hoffman <[email protected]> * CI: add exact-key mode for pnpm cache restore * CI: reduce pre-test Windows setup latency * CI: increase checks-windows test shards to 3 * CI: increase checks-windows test shards to 4 * docs(feishu): Feishu docs – add verificationToken and align zh-CN with EN (openclaw#31555) thanks @xbsheng Verified: - pnpm build - pnpm test:macmini - pnpm check (blocked locally by pre-existing mainline lint issue in src/scripts/ci-changed-scope.test.ts unrelated to this PR) Co-authored-by: xbsheng <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> * fix(ci): avoid shell interpolation in changed-scope git diff * test(ci): add changed-scope shell-injection regression * fix(gateway): retry exec-read live tool probe * feat(feishu): add broadcast support for multi-agent groups (#29575) * feat(feishu): add broadcast support for multi-agent group observation When multiple agents share a Feishu group chat, only the @mentioned agent receives the message. This prevents observer agents from building session memory of group activity they weren't directly addressed in. Adds broadcast support (reusing the same cfg.broadcast schema as WhatsApp) so all configured agents receive every group message in their session transcripts. Only the @mentioned agent responds on Feishu; observer agents process silently via no-op dispatchers. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): guard sequential broadcast dispatch against single-agent failure Wrap each dispatchForAgent() call in the sequential loop with try/catch so one agent's dispatch failure doesn't abort delivery to remaining agents. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): avoid duplicate messages in broadcast observer mode and normalize agent IDs - Skip recordPendingHistoryEntryIfEnabled for broadcast groups when not mentioned, since the message is dispatched directly to all agents. Previously the message appeared twice in the agent prompt. - Normalize agent IDs with toLowerCase() before membership checks so config casing mismatches don't silently skip valid agents. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): set WasMentioned per-agent and normalize broadcast IDs - buildCtxPayloadForAgent now takes a wasMentioned parameter so active agents get WasMentioned=true and observers get false (P1 fix) - Normalize broadcastAgents to lowercase at resolution time and lowercase activeAgentId so all comparisons and session key generation use canonical IDs regardless of config casing (P2 fix) Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): canonicalize broadcast agent IDs with normalizeAgentId * fix(feishu): match ReplyDispatcher sync return types for noop dispatcher The upstream ReplyDispatcher changed sendToolResult/sendBlockReply/ sendFinalReply to synchronous (returning boolean). Update the broadcast observer noop dispatcher to match. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): deduplicate broadcast agent IDs after normalization Config entries like "Main" and "main" collapse to the same canonical ID after normalizeAgentId but were dispatched multiple times. Use Set to deduplicate after normalization. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): honor requireMention=false when selecting broadcast responder When requireMention is false, the routed agent should be active (reply on Feishu) even without an explicit @mention. Previously activeAgentId was null whenever ctx.mentionedBot was false, so all agents got the noop dispatcher and no reply was sent — silently breaking groups that disabled mention gating. Hoist requireMention out of the if(isGroup) block so it's accessible in the dispatch code. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): cross-account broadcast dedup to prevent duplicate dispatches In multi-account Feishu setups, the same message event is delivered to every bot account in a group. Without cross-account dedup, each account independently dispatches broadcast agents, causing 2×N dispatches instead of N (where N = number of broadcast agents). Two changes: 1. requireMention=true + bot not mentioned: return early instead of falling through to broadcast. The mentioned bot's handler will dispatch for all agents. Non-mentioned handlers record to history. 2. Add cross-account broadcast dedup using a shared 'broadcast' namespace (tryRecordMessagePersistent). The first handler to reach the broadcast block claims the message; subsequent accounts skip. This handles the requireMention=false multi-account case. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): strip CommandAuthorized from broadcast observer contexts Broadcast observer agents inherited CommandAuthorized from the sender, causing slash commands (e.g. /reset) to silently execute on every observer session. Now only the active agent retains CommandAuthorized; observers have it stripped before dispatch. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): use actual mention state for broadcast WasMentioned The active broadcast agent's WasMentioned was set to true whenever requireMention=false, even when the bot was not actually @mentioned. Now uses ctx.mentionedBot && agentId === activeAgentId, consistent with the single-agent path which passes ctx.mentionedBot directly. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(feishu): skip history buffer for broadcast accounts and log parallel failures 1. In requireMention groups with broadcast, non-mentioned accounts no longer buffer pending history — the mentioned handler's broadcast dispatch already writes turns into all agent sessions. Buffering caused duplicate replay via buildPendingHistoryContextFromMap. 2. Parallel broadcast dispatch now inspects Promise.allSettled results and logs rejected entries, matching the sequential path's per-agent error logging. Co-Authored-By: Claude Opus 4.6 <[email protected]> * Changelog: note Feishu multi-agent broadcast dispatch * Changelog: restore author credit for Feishu broadcast entry --------- Co-authored-by: Claude Opus 4.6 <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> * chore(release): prepare 2026.3.2-beta.1 * fix(ci): complete feishu route mock typing in broadcast tests * Delete changelog/fragments directory * refactor(feishu): unify Lark SDK error handling with LarkApiError (#31450) * refactor(feishu): unify Lark SDK error handling with LarkApiError - Add LarkApiError class with code, api, and context fields for better diagnostics - Add ensureLarkSuccess helper to replace 9 duplicate error check patterns - Update tool registration layer to return structured error info (code, api, context) This improves: - Observability: errors now include API name and request context for easier debugging - Maintainability: single point of change for error handling logic - Extensibility: foundation for retry strategies, error classification, etc. Affected APIs: - wiki.space.getNode - bitable.app.get - bitable.app.create - bitable.appTableField.list - bitable.appTableField.create - bitable.appTableRecord.list - bitable.appTableRecord.get - bitable.appTableRecord.create - bitable.appTableRecord.update * Changelog: note Feishu bitable error handling unification --------- Co-authored-by: echoVic <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> * fix(gateway): skip google rate limits in live suite * CI: add toggle to skip pnpm actions cache restore * CI: shard Windows tests into sixths and skip cache restore * chore(release): update appcast for 2026.3.2-beta.1 * fix(feishu): correct invalid scope name in permission grant URL (#32509) * fix(feishu): correct invalid scope name in permission grant URL The Feishu API returns error code 99991672 with an authorization URL containing the non-existent scope `contact:contact.base:readonly` when the `contact.user.get` endpoint is called without the correct permission. The valid scope is `contact:user.base:readonly`. Add a scope correction map that replaces known incorrect scope names in the extracted grant URL before presenting it to the user/agent, so the authorization link actually works. Closes #31761 * chore(changelog): note feishu scope correction --------- Co-authored-by: SidQin-cyber <[email protected]> * docs: add dedicated pdf tool docs page * feishu, line: pass per-group systemPrompt to inbound context (#31713) * feishu: pass per-group systemPrompt to inbound context The Feishu extension schema supports systemPrompt in per-group config (channels.feishu.accounts.<id>.groups.<groupId>.systemPrompt) but the value was never forwarded to the inbound context as GroupSystemPrompt. This means per-group system prompts configured for Feishu had no effect, unlike IRC, Discord, Slack, Telegram, Matrix, and other channels that already pass this field correctly. Co-authored-by: Copilot <[email protected]> * line: pass per-group systemPrompt to inbound context Same issue as feishu: the Line config schema defines systemPrompt in per-group config but the value was never forwarded as GroupSystemPrompt in the inbound context payload. Added resolveLineGroupSystemPrompt helper that mirrors the existing resolveLineGroupConfig lookup logic (groupId > roomId > wildcard). Co-authored-by: Copilot <[email protected]> * Changelog: note Feishu and LINE group systemPrompt propagation --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> * docs(changelog): remove docs-only 2026.3.2 entries * CI: speed up Windows dependency warmup * CI: make node deps install optional in setup action * CI: reduce critical path for check build and windows jobs * fix(feishu): non-blocking WS ACK and preserve full streaming card content (#29616) * fix(feishu): non-blocking ws ack and preserve streaming card full content * fix(feishu): preserve fragmented streaming text without newline artifacts --------- Co-authored-by: Tak Hoffman <[email protected]> * fix: add session-memory hook support for Feishu provider (#31437) * fix: add session-memory hook support for Feishu provider Issue #31275: Session-memory hook not triggered when using /new command in Feishu - Added command handler to Feishu provider - Integrated with OpenClaw's before_reset hook system - Ensures session memory is saved when /new or /reset commands are used * Changelog: note Feishu session-memory hook parity --------- Co-authored-by: Tak Hoffman <[email protected]> * fix(feishu): guard against false-positive @mentions in multi-app groups (#30315) * fix(feishu): guard against false-positive @mentions in multi-app groups When multiple Feishu bot apps share a group chat, Feishu's WebSocket event delivery remaps the open_id in mentions[] per-app. This causes checkBotMentioned() to return true for ALL bots when only one was actually @mentioned, making requireMention ineffective. Add a botName guard: if the mention's open_id matches this bot but the mention's display name differs from this bot's configured botName, treat it as a false positive and skip. botName is already available via account.config.botName (set during onboarding). Closes #24249 * fix(feishu): support @all mention in multi-bot groups When a user sends @all (@_all in Feishu message content), treat it as mentioning every bot so all agents respond when requireMention is true. Feishu's @all does not populate the mentions[] array, so this needs explicit content-level detection. * fix(feishu): auto-fetch bot display name from API for reliable mention matching Instead of relying on the manually configured botName (which may differ from the actual Feishu bot display name), fetch the bot's display name from the Feishu API at startup via probeFeishu(). This ensures checkBotMentioned() always compares against the correct display name, even when the config botName doesn't match (e.g. config says 'Wanda' but Feishu shows '绯红女巫'). Changes: - monitor.ts: fetchBotOpenId → fetchBotInfo (returns both openId and name) - monitor.ts: store botNames map, pass botName to handleFeishuMessage - bot.ts: accept botName from params, prefer it over config fallback * Changelog: note Feishu multi-app mention false-positive guard --------- Co-authored-by: Teague Xiao <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> * CI: start push test lanes earlier and drop check gating * CI: disable flaky sticky disk mount for Windows pnpm setup * chore(release): cut 2026.3.2 * fix(feishu): normalize all mentions in inbound agent context (#30252) * fix(feishu): normalize all mentions in inbound agent context Convert Feishu mention placeholders to explicit <at user_id="..."> tags (including bot mentions), add mention semantics hints for the model, and remove unused mentionMessageBody parsing to keep context handling consistent. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> * fix(feishu): use replacer callback and escape only < > in normalizeMentions Switch String.replace to a function replacer to prevent $ sequences in display names from being interpolated as replacement patterns. Narrow escaping to < and > only — & does not need escaping in LLM prompt tag bodies and escaping it degrades readability (e.g. R&D → R&D). Co-Authored-By: Claude Sonnet 4.6 <[email protected]> * fix(feishu): only use open_id in normalizeMentions tag, drop user_id fallback When a mention has no open_id, degrade to @name instead of emitting <at user_id="uid_...">. This keeps the tag user_id space exclusively open_id, so the bot self-reference hint (which uses botOpenId) is always consistent with what appears in the tags. Co-Authored-By: Claude Sonnet 4.6 <[email protected]> * fix(feishu): register mention strip
Part 1/5 of iOS security stack (split from #31979).
Scope
Files
Stack
mainstack/ios-sec-02-concurrency-locks)