feat(plugins): add SQLite plugin state store#74190
Conversation
|
Codex review: needs maintainer review before merge. What this changes: The PR branch adds a SQLite-backed keyed plugin state store exposed as Maintainer follow-up before merge: Keep this PR open for explicit maintainer and security/runtime-storage review. The best path is to decide whether the bundled-only keyed-store API is the right plugin boundary, resolve or consciously accept the SQLite event-loop and permission-hardening tradeoffs, verify generated SDK docs/baseline and CI proof, and keep consumer migrations in separate owner-scoped PRs. Best possible solution: Keep this PR open for explicit maintainer and security/runtime-storage review. The best path is to decide whether the bundled-only keyed-store API is the right plugin boundary, resolve or consciously accept the SQLite event-loop and permission-hardening tradeoffs, verify generated SDK docs/baseline and CI proof, and keep consumer migrations in separate owner-scoped PRs. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 443ca4865d61. |
Validation blitz — plugin state store E2E resultsFive standalone scripts exercised the plugin state store end-to-end against real SQLite on real filesystem. Every script used an isolated 1. CRUD + Persistence — 25/25 ✅
2. TTL + Sweep — 18/18 ✅
3. Isolation + Multi-plugin — 24/24 ✅
4. Limits + Eviction — 22/22 ✅
5. Stress + WAL — 10/10 ✅
Summary
No bugs found. The store is solid across all tested dimensions — correctness, isolation, limits, persistence, TTL semantics, eviction ordering, error handling, and stress. |
aadee11 to
56adb0d
Compare
Greptile SummaryThis PR introduces a SQLite-backed keyed-state store for bundled plugins, exposed as Confidence Score: 4/5Safe to merge; all findings are non-blocking style/consistency issues with no correctness impact. Only P2-level findings: two write operations (delete, clear) silently skip the permission-hardening path that register/consume use, a redundant ensureSchema call in the probe function, and a minor allocation pattern in the Proxy state handler. No logic bugs or data-loss risks identified. src/plugin-state/plugin-state-store.sqlite.ts — the pluginStateDelete and pluginStateClear functions should be reviewed for permission-hardening consistency. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/plugin-state/plugin-state-store.sqlite.ts
Line: 477-494
Comment:
**`delete` and `clear` skip permission hardening**
`pluginStateDelete` and `pluginStateClear` (line 518) call `openPluginStateDatabase` directly and issue writes without going through `runWriteTransaction`. This means they never call `ensurePluginStatePermissions`, which is the step that tightens file permissions on `0o600` after every write. `register` and `consume` always get permission hardening via `runWriteTransaction`; these two silently skip it. If a file ends up world-readable after an external process touches it, a `delete` or `clear` will leave it that way whereas a `register` would re-tighten it.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/plugin-state/plugin-state-store.sqlite.ts
Line: 596-600
Comment:
**Redundant `ensureSchema` call in probe**
`openPluginStateDatabase` already calls `ensureSchema` internally (line 319). Calling it again on line 599 inside `probePluginStateStore` is harmless because `CREATE TABLE IF NOT EXISTS` is idempotent, but it's dead work and may confuse readers into thinking the probe step does additional schema validation. Consider removing this second call or renaming the probe step to `"open-and-schema"` if you want a unified step name.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/plugins/registry.ts
Line: 1949-1975
Comment:
**`runtime.state` allocates a new object on every property access**
The Proxy's `get` trap for `"state"` spreads `baseState` and creates a fresh `openKeyedStore` closure each time `runtime.state` is accessed. The proxy itself is cached per `pluginId` in `pluginRuntimeById`, but the returned state object isn't. For plugins that access `runtime.state` in registration callbacks this is harmless, but it's easy to inadvertently call `openKeyedStore` in a hot path (e.g., inside a hook handler that accesses `api.runtime.state` on every event). Memoizing the state object inside the proxy closure — as is already done for `subagent` via `defineCachedValue` — would make this consistent with the rest of the runtime API.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(plugins): keep committed state write..." | Re-trigger Greptile |
| export function pluginStateDelete(params: { | ||
| pluginId: string; | ||
| namespace: string; | ||
| key: string; | ||
| }): boolean { | ||
| try { | ||
| const { statements } = openPluginStateDatabase("delete"); | ||
| const result = statements.deleteEntry.run(params.pluginId, params.namespace, params.key); | ||
| return result.changes > 0; | ||
| } catch (error) { | ||
| throw wrapPluginStateError( | ||
| error, | ||
| "delete", | ||
| "PLUGIN_STATE_WRITE_FAILED", | ||
| "Failed to delete plugin state entry.", | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
delete and clear skip permission hardening
pluginStateDelete and pluginStateClear (line 518) call openPluginStateDatabase directly and issue writes without going through runWriteTransaction. This means they never call ensurePluginStatePermissions, which is the step that tightens file permissions on 0o600 after every write. register and consume always get permission hardening via runWriteTransaction; these two silently skip it. If a file ends up world-readable after an external process touches it, a delete or clear will leave it that way whereas a register would re-tighten it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugin-state/plugin-state-store.sqlite.ts
Line: 477-494
Comment:
**`delete` and `clear` skip permission hardening**
`pluginStateDelete` and `pluginStateClear` (line 518) call `openPluginStateDatabase` directly and issue writes without going through `runWriteTransaction`. This means they never call `ensurePluginStatePermissions`, which is the step that tightens file permissions on `0o600` after every write. `register` and `consume` always get permission hardening via `runWriteTransaction`; these two silently skip it. If a file ends up world-readable after an external process touches it, a `delete` or `clear` will leave it that way whereas a `register` would re-tighten it.
How can I resolve this? If you propose a fix, please make it concise.| try { | ||
| const store = openPluginStateDatabase("probe"); | ||
| pushOk("open"); | ||
| ensureSchema(store.db, store.path); | ||
| pushOk("schema"); |
There was a problem hiding this comment.
Redundant
ensureSchema call in probe
openPluginStateDatabase already calls ensureSchema internally (line 319). Calling it again on line 599 inside probePluginStateStore is harmless because CREATE TABLE IF NOT EXISTS is idempotent, but it's dead work and may confuse readers into thinking the probe step does additional schema validation. Consider removing this second call or renaming the probe step to "open-and-schema" if you want a unified step name.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugin-state/plugin-state-store.sqlite.ts
Line: 596-600
Comment:
**Redundant `ensureSchema` call in probe**
`openPluginStateDatabase` already calls `ensureSchema` internally (line 319). Calling it again on line 599 inside `probePluginStateStore` is harmless because `CREATE TABLE IF NOT EXISTS` is idempotent, but it's dead work and may confuse readers into thinking the probe step does additional schema validation. Consider removing this second call or renaming the probe step to `"open-and-schema"` if you want a unified step name.
How can I resolve this? If you propose a fix, please make it concise.| @@ -1952,10 +1958,27 @@ export function createPluginRegistry(registryParams: PluginRegistryParams) { | |||
| } | |||
| const runtime = new Proxy(registryParams.runtime, { | |||
| get(target, prop, receiver) { | |||
| if (prop === "state") { | |||
| const baseState = Reflect.get(target, prop, receiver) as PluginRuntime["state"]; | |||
| return { | |||
| ...baseState, | |||
| openKeyedStore: <T>(options: OpenKeyedStoreOptions): PluginStateKeyedStore<T> => { | |||
| const record = | |||
| pluginRuntimeRecordById.get(pluginId) ?? | |||
| registry.plugins.find((entry) => entry.id === pluginId); | |||
| if (record?.origin !== "bundled") { | |||
| throw new Error( | |||
| "openKeyedStore is only available for bundled plugins in this release.", | |||
| ); | |||
| } | |||
| return createPluginStateKeyedStore<T>(pluginId, options); | |||
| }, | |||
There was a problem hiding this comment.
runtime.state allocates a new object on every property access
The Proxy's get trap for "state" spreads baseState and creates a fresh openKeyedStore closure each time runtime.state is accessed. The proxy itself is cached per pluginId in pluginRuntimeById, but the returned state object isn't. For plugins that access runtime.state in registration callbacks this is harmless, but it's easy to inadvertently call openKeyedStore in a hot path (e.g., inside a hook handler that accesses api.runtime.state on every event). Memoizing the state object inside the proxy closure — as is already done for subagent via defineCachedValue — would make this consistent with the rest of the runtime API.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/registry.ts
Line: 1949-1975
Comment:
**`runtime.state` allocates a new object on every property access**
The Proxy's `get` trap for `"state"` spreads `baseState` and creates a fresh `openKeyedStore` closure each time `runtime.state` is accessed. The proxy itself is cached per `pluginId` in `pluginRuntimeById`, but the returned state object isn't. For plugins that access `runtime.state` in registration callbacks this is harmless, but it's easy to inadvertently call `openKeyedStore` in a hot path (e.g., inside a hook handler that accesses `api.runtime.state` on every event). Memoizing the state object inside the proxy closure — as is already done for `subagent` via `defineCachedValue` — would make this consistent with the rest of the runtime API.
How can I resolve this? If you propose a fix, please make it concise.
Validation blitz — round 2: robustness, contention, memory, and proxy pathFive additional standalone scripts targeting gaps identified after the first 109-check round. All run against real SQLite on real filesystem with isolated state dirs. 121 additional checks, 121 passed, 0 failed. 1. Multi-process contention — 8/8 ✅6 child processes, each doing 250 mixed ops (1,500 total) against the same SQLite database file simultaneously.
WAL mode + 2. Close during in-flight writes — 38/38 ✅
3. Database integrity + WAL checkpoint — 30/30 ✅
4. RSS memory growth — 2/2 ✅
No memory leak detected. RSS growth is modest and sublinear across 30,000 operations and 100 distinct store instances. 5. Runtime proxy path + bundled-only gate — 15/15 ✅
Combined totals (round 1 + round 2)
No bugs found across 236 checks covering correctness, isolation, limits, persistence, TTL, eviction, proxy binding, multi-process contention, close safety, database integrity, WAL checkpointing, memory growth, and stress. |
Review feedback responseAddressed findings from Aisle, Greptile, and self-review. All changes are local — not yet pushed. Fixed
Tests added for all three limit types (key bytes, namespace bytes, nesting depth). All 31 tests pass (17 unit + 13 e2e + 1 permissions). Declined
|
9173ca3 to
dfd3973
Compare
Validation blitz — round 3: full gateway E2E lifecycleStandalone script exercising the plugin state store through a real gateway process — start, write state, stop, verify persistence, restart, verify again. 21 checks, 21 passed, 0 failed. What this proves that rounds 1–2 didn'tRounds 1–2 (236 checks) exercised the store API directly and through the registry proxy with a mock runtime. This round starts the actual gateway binary ( Results — 21/21 ✅
Key observations
Combined totals (all 3 rounds)
|
ba799c3 to
1f320e3
Compare
Add key byte-length (512B), namespace byte-length (128B), and JSON nesting depth (64) limits to prevent resource exhaustion. Reject ttlMs < 1 to eliminate immediately-expired entry trap. Move validation plan to docs/reference/.
Move plugin state DB from shared ~/.openclaw/plugins/ to dedicated ~/.openclaw/plugin-state/ so permission hardening doesn't touch the plugin install directory. Remove unnecessary type assertions flagged by check-lint. Thread correct operation into invalidInput() so validation errors from lookup/delete/consume/open report the actual operation instead of hardcoding 'register'. Rename overclaiming E2E test name.
1f320e3 to
fe457b2
Compare
* feat(plugins): add experimental sqlite plugin state store
* feat(plugins): add experimental sqlite plugin state store
Summary
api.runtime.state.openKeyedStorefor bundled plugins.Planned follow-up consumers
This PR ships the SQLite-backed runtime store foundation. Consumer migrations should be separate PRs so each plugin can preserve its current semantics.
Suggested order:
slack.thread-participation, keyed by${accountId}:${channelId}:${threadTs}, storing{ agentId, repliedAt }with the Slack TTL/cap policy. The main migration decision is thatrecordSlackThreadParticipationis currently synchronous, so the follow-up should either make that path async or keep the in-memory hot path and fire-and-forget the durable write. Keep persistence separate from [Bug]: SlackrequireMention: trueis bypassed for thread replies after prior bot participation #64277 /thread.requireExplicitMentionpolicy semantics.discord.componentsanddiscord.modals. Preserve the current peek-vs-consume behavior and persist only the safe public registry shape.extensions/msteams/src/sent-message-cache.tsandextensions/msteams/src/monitor-handler/message-handler.tscurrently keep a 24h process-global map of bot-sent message ids so replies can be classified asreply_to_bot. Restart drops that state, unlike Telegram's analogous file-backed sent-message cache. Migrate to a namespace such asmsteams.sent-messages, keyed by${conversationId}:${messageId}, with the same TTL.extensions/matrix/src/approval-reactions.ts,extensions/matrix/src/approval-handler.runtime.ts, andextensions/matrix/src/matrix/monitor/reaction-events.tscurrently keeproomId:eventId -> { approvalId, allowedDecisions }in memory. Restart can leave a visible approval prompt whose reactions no longer resolve. Migrate the narrow reaction-target metadata to a Matrix approval namespace without broadening the stored approval payload.register/lookup/consumeAPI because its dedupe contract has claim/commit/release and same-process memory semantics. Follow-up should either share this store's path/locking/atomic hardening or add a keyed-store-level atomic claim/check primitive first.Validation
New unit and end to end tests. Full automated blitz, stress testing the solution - see findings in below comment.