Skip to content

channels: make plugin state persistence best effort by default#75053

Closed
amknight wants to merge 3 commits intomainfrom
ak/plugin-fixes-sdk-default-on
Closed

channels: make plugin state persistence best effort by default#75053
amknight wants to merge 3 commits intomainfrom
ak/plugin-fixes-sdk-default-on

Conversation

@amknight
Copy link
Copy Markdown
Member

Summary

Alternative to #74933 that removes the per-plugin experimentalPersistentState opt-in and instead treats the SQLite-backed SDK state store as best-effort durability behind the existing in-memory caches.

The existing process-local maps/dedupe caches remain the fast path and source of truth. Once a bundled plugin runtime is available, each migrated cache automatically tries api.runtime.state.openKeyedStore. If opening the store or any store operation fails, the plugin logs the failure, disables further persistent-store use for that process, and continues with the previous process-local behavior.

Migrations

Plugin Cache Namespace Behavior
Slack thread participation (sent-thread-cache) slack.thread-participation memory first; persistent lookup on miss; backfills memory after restart hit
Discord component & modal registries (components-registry) discord.components, discord.modals memory first; persistent lookup/consume on miss; persistent delete when in-memory consume drains an entry
MS Teams sent-message dedupe (sent-message-cache) msteams.sent-messages memory first; persistent lookup on miss; backfills memory after restart hit
Matrix approval reaction targets (approval-reactions) matrix.approval-reactions memory first; persistent lookup on miss; persistent delete on unregister

Why this shape

This keeps the old behavior as the fallback path without requiring operators to opt in through config:

  • No experimentalPersistentState manifest schema entries.
  • No resolvePluginConfigObject(...) gates in the hot paths.
  • No cfg threading solely for persistence.
  • SQLite availability or corruption cannot break message handling.

The behavior change when SQLite works is restart durability for existing bundled plugin caches. When SQLite is unavailable, behavior falls back to the pre-persistence process-local cache behavior.

Tests

Added/updated tests for:

  • persistent read/write behavior when runtime state is available
  • in-memory hot-path backfill after persistent hits for Slack/MS Teams
  • unavailable SQLite fallback for Slack, Discord, MS Teams, and Matrix
  • relevant call-site wiring for Slack, Discord, Matrix, and MS Teams

Validation

Local from ../openclaw-plugin-fixes-sdk-default-on:

  • pnpm test extensions/slack/src/sent-thread-cache.test.ts extensions/discord/src/components.test.ts extensions/msteams/src/sent-message-cache.test.ts extensions/matrix/src/approval-reactions.test.ts — passed
  • pnpm test extensions/slack/src/action-runtime.test.ts extensions/slack/src/monitor/message-handler/prepare.test.ts extensions/discord/src/monitor/monitor.test.ts extensions/matrix/src/matrix/monitor/reaction-events.test.ts extensions/matrix/src/approval-handler.runtime.test.ts extensions/msteams/src/monitor-handler/message-handler.authz.test.ts — passed
  • pnpm exec oxfmt --check ... on touched TS files — passed
  • git diff HEAD --check — passed
  • pnpm check:changed — passed

Notes

This branch is intentionally separate from #74933 so reviewers can compare the opt-in version against the default-on best-effort version.

…ams, matrix

Adds plugins.entries.<id>.config.experimentalPersistentState (default
false) to slack/discord/msteams/matrix, mirroring four in-memory caches
into the SDK-backed plugin state store while keeping the existing
process-local fast paths as the default behavior:

- Slack thread participation (sent-thread-cache)
- Discord component & modal registries (components-registry)
- MS Teams sent-message dedupe (sent-message-cache)
- Matrix approval reaction targets (approval-reactions)

Errors from the persistent path are logged via the plugin runtime and
never break the synchronous record/lookup paths. Each plugin gets a
*ForConfig async variant used by handlers that already run in async
contexts; opt-in is gated by resolvePluginConfigObject(cfg, ...).

Tests: extends each plugin's cache test with no-op-when-disabled and
persistent register/lookup with a fake runtime store. Updates relevant
docs (channels/{slack,discord,msteams,matrix}.md, plugins/sdk-runtime.md)
and the changelog.
Address draft PR review feedback for the opt-in SDK-backed channel state
migration:

- backfill Slack and MS Teams in-memory caches after persistent lookup hits
  so hot inbound paths only pay the SQLite lookup once per key after restart
- reset cached persistent store handles from existing cache clear helpers so
  tests and runtime reset paths reopen stores from the current runtime
- use a narrow Matrix plugins config lookup type instead of a double cast
- document why Slack persists agentId even though current reads only check
  participation presence

Validation: pnpm test extensions/slack/src/sent-thread-cache.test.ts extensions/discord/src/components.test.ts extensions/msteams/src/sent-message-cache.test.ts extensions/matrix/src/approval-reactions.test.ts; pnpm check:changed.
Build the alternative default-on version of the bundled plugin state
migration. The four migrated caches now keep their existing in-memory fast
paths as the source of truth and automatically try the SDK-backed keyed
store once their plugin runtime is available. If opening SQLite or a store
operation fails, the plugin logs once for that path, disables further
persistent access for the process, and keeps the previous process-local
behavior.

This removes the experimentalPersistentState plugin config gate and the cfg
threading that only existed for that opt-in. Slack/MS Teams continue to
backfill memory after restart lookups; Discord keeps consume/delete
semantics; Matrix keeps versioned approval reaction targets. Tests cover
both available persistent stores and unavailable SQLite fallback.

Validation: targeted plugin cache tests, relevant call-site tests, oxfmt
check, and pnpm check:changed.
@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation channel: discord Channel integration: discord channel: matrix Channel integration: matrix channel: msteams Channel integration: msteams channel: slack Channel integration: slack plugin: file-transfer size: XL maintainer Maintainer-authored PR labels Apr 30, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 30, 2026

Codex review: needs maintainer review before merge.

What this changes:

The PR makes Slack, Discord, Microsoft Teams, and Matrix bundled plugin caches automatically use the SQLite-backed SDK keyed state store as best-effort restart durability behind their existing in-memory caches, with docs, changelog, and focused tests updated.

Maintainer follow-up before merge:

Protected maintainer-labeled draft PR needs human review to choose default-on versus opt-in persistence and confirm validation; there is no discrete repair task for an automated replacement PR.

Security review:

Security review cleared: Read-only diff review found no workflow, dependency, lockfile, lifecycle script, secret-handling, or new third-party code execution changes; the security-relevant change is persisted channel metadata using the existing bundled-plugin state store.

Review details

Best possible solution:

Have a maintainer choose the intended rollout model between this default-on PR and #74933, then land the selected bundled-plugin migration with the existing keyed-store SDK contract, focused cache tests, docs, and changed-gate proof.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a feature PR rather than a bug report. The motivating limitation is statically verifiable on current main: the targeted Slack, Discord, Microsoft Teams, and Matrix registries are process-local, so a Gateway restart drops their state.

Is this the best way to solve the issue?

Unclear pending maintainer review. The patch is a plausible narrow implementation of default-on best-effort persistence, but the existence of the opt-in alternative and the protected maintainer label mean this should not be treated as automatically merge-ready.

Acceptance criteria:

  • Review branch validation already reported by the author: targeted plugin cache tests, relevant call-site tests, oxfmt check, git diff check, and pnpm check:changed.
  • Before merge, run the repository-standard changed gate in Testbox for the selected persistence PR.

What I checked:

  • Protected PR routing: The provided GitHub context shows this PR is open, draft, contributor-authored, and labeled maintainer; repository policy requires keeping protected-label PRs open for explicit maintainer handling. (9f3748cce70e)
  • Current main Slack cache is process-local: Current main stores Slack thread participation in a global in-memory dedupe cache and has no persistent state path, which matches the PR's stated migration target. (extensions/slack/src/sent-thread-cache.ts:16, 58a0b077c1c5)
  • Current main Discord registry is process-local: Current main stores Discord component and modal entries in global maps with local TTL/consume behavior, so restart-safe component registry persistence is not already implemented on main. (extensions/discord/src/components-registry.ts:4, 58a0b077c1c5)
  • Current main Teams and Matrix caches are process-local: Microsoft Teams sent-message markers use a process-global map, and Matrix approval reaction targets use a module-level map on current main. (extensions/msteams/src/sent-message-cache.ts:1, 58a0b077c1c5)
  • PR implements best-effort persistence with fallback: The PR branch opens keyed stores from optional plugin runtimes, disables the persistent path after open/operation failures, and preserves in-memory lookup/write behavior for Slack, Discord, Teams, and Matrix. (extensions/slack/src/sent-thread-cache.ts:62, 9f3748cce70e)
  • SDK state-store contract supports this surface: Current main exposes runtime.state.openKeyedStore through the plugin runtime proxy and restricts it to bundled plugins, matching this PR's bundled-channel scope. (src/plugins/registry.ts:2025, 58a0b077c1c5)

Likely related people:

  • amknight: Remote history shows amknight introduced the SQLite-backed plugin state store in feat(plugins): add SQLite plugin state store #74190, the direct SDK dependency this PR consumes. (role: state-store foundation author and likely follow-up owner; confidence: high; commits: bbf985d50a64; files: src/plugin-state/plugin-state-store.ts, docs/plugins/sdk-runtime.md)
  • steipete: Remote file history shows recent work on plugin SDK/runtime boundaries and shared channel cache seams across the affected Slack, Discord, and Teams paths. (role: recent plugin SDK and channel-cache maintainer; confidence: high; commits: f0000ab72d01, 89d65521fe34, 78160b5f8813; files: extensions/slack/src/sent-thread-cache.ts, extensions/discord/src/components-registry.ts, extensions/msteams/src/sent-message-cache.ts)
  • gumadeiras: Remote path history points to the Matrix exec approval reaction shortcuts as the origin of the Matrix approval reaction target behavior this PR persists. (role: Matrix approval reaction behavior owner; confidence: high; commits: 0aaf7531481d; files: extensions/matrix/src/approval-reactions.ts, extensions/matrix/src/approval-handler.runtime.ts, extensions/matrix/src/matrix/monitor/reaction-events.ts)
  • TaKO8Ki: Remote path history points to the MSTeams sent-message dedupe storage change as the origin of the current marker cache this PR persists. (role: Microsoft Teams sent-message cache introducer; confidence: high; commits: 0bee3f337ae5; files: extensions/msteams/src/sent-message-cache.ts)
  • huntharo: Remote path history shows prior Discord component registry/expiration work, which is adjacent to the Discord registry persistence in this PR. (role: Discord component registry adjacent owner; confidence: medium; commits: e24bf22f985e; files: extensions/discord/src/components-registry.ts, extensions/discord/src/monitor/agent-components.handlers.ts)

Remaining risk / open question:

  • The PR is draft and protected by the maintainer label, so default-on persistence versus the opt-in design in channels: opt-in SDK-backed persistent state for slack, discord, msteams, matrix #74933 remains an owner/product decision.
  • The author reports targeted tests and pnpm check:changed passing, but this read-only review did not independently run Testbox or broad validation.
  • Persisting channel interaction metadata by default has storage/privacy tradeoffs that should be explicitly accepted by maintainers even though the code keeps SQLite failures best-effort.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 58a0b077c1c5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord channel: matrix Channel integration: matrix channel: msteams Channel integration: msteams channel: slack Channel integration: slack docs Improvements or additions to documentation maintainer Maintainer-authored PR plugin: file-transfer size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant