Skip to content

feat(tasks): move task ledger to sqlite and add audit CLI#57361

Merged
vincentkoc merged 7 commits intomainfrom
feat/task-sqlite-audit
Mar 30, 2026
Merged

feat(tasks): move task ledger to sqlite and add audit CLI#57361
vincentkoc merged 7 commits intomainfrom
feat/task-sqlite-audit

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Problem: the shared task ledger still used full-snapshot JSON persistence, which rewrote the whole registry on every mutation and kept the backend contract too weak for cleanup and operator inspection.
  • Why it matters: tasks are now shared across ACP, cron, subagents, and CLI work, so we need durable indexed storage and a first-class way to spot stale or broken runs before the ledger turns into opaque clutter.
  • What changed: moved the task registry to SQLite-only incremental persistence, removed the JSON backend path entirely, and added openclaw tasks audit for stale, lost, delivery-failed, and inconsistent task runs.
  • What did NOT change (scope boundary): no JSON migration, no Redis/Temporal executor work, no cron schedule state in tasks, and no subsystem-specific metadata blobs in the shared task rows.

AI-assisted: yes
Testing: fully tested on the touched surface

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause / Regression History (if applicable)

  • Root cause: the task registry was still modeled as a JSON snapshot store, so every mutation rewrote the full ledger and there was no clean operator surface for stale or broken shared runs.
  • Missing detection / guardrail: we had no task-audit read model or CLI surface to flag stuck/lost cleanup issues.
  • Prior context (git blame, prior PR, issue, or refactor if known): this follows the unreleased shared task-run registry work and the status-pressure follow-up merged in feat(status): surface task run pressure #57350.
  • Why this regressed now: once tasks started spanning ACP, cron, subagents, and CLI, the old persistence seam became the weakest part of the design.
  • If unknown, what was ruled out: N/A.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/tasks/task-registry.store.test.ts
    • src/tasks/task-registry.audit.test.ts
    • src/commands/tasks.test.ts
    • src/cli/program/register.status-health-sessions.test.ts
    • src/tasks/task-registry.test.ts
  • Scenario the test should lock in: sqlite-backed restore works, the registry still behaves the same at the task API seam, and stale/lost audit findings remain visible through the CLI.
  • Why this is the smallest reliable guardrail: these tests cover the new persistence seam and the new operator surface without needing broader channel/runtime E2E.
  • Existing test that already covers this (if any): src/tasks/task-registry.test.ts covers the shared lifecycle semantics.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

  • openclaw tasks now reads from a SQLite-backed shared task ledger.
  • openclaw tasks audit reports stale, lost, delivery-failed, missing-cleanup, and timestamp-inconsistent task runs.
  • The task registry no longer supports the old JSON backend path.

Diagram (if applicable)

Before:
[task mutation] -> [rewrite runs.json snapshot] -> [operator only sees list/show]

After:
[task mutation] -> [upsert/delete task_runs row in sqlite] -> [retention + audit surfaces] -> [operator sees stale/broken runs]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (Yes)
  • If any Yes, explain risk + mitigation: task persistence now uses a local SQLite file instead of a JSON snapshot. Scope remains local-only under the state dir, with the same task data as before and no new remote access.

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 / pnpm
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): default local state dir

Steps

  1. Create or update shared task runs through ACP/cron/subagent/CLI paths.
  2. Restart the task registry process and inspect openclaw tasks.
  3. Run openclaw tasks audit against stale or broken task fixtures.

Expected

  • Task runs restore from SQLite, cleanup semantics stay intact, and audit findings are reported for stale or broken runs.

Actual

  • Matches expected in the touched test/build lanes.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • pnpm test -- src/tasks/task-registry.store.test.ts src/tasks/task-registry.audit.test.ts src/commands/tasks.test.ts src/cli/program/register.status-health-sessions.test.ts
    • pnpm test -- src/tasks/task-registry.test.ts
    • pnpm build
    • pnpm check
  • Edge cases checked:
    • default sqlite restore path
    • custom task store hook behavior still works
    • audit output for stale running/lost/missing cleanup
    • tasks CLI registration and filters
  • What you did not verify:
    • full end-to-end ACP/cron runtime execution against a live state dir

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (No)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: SQLite availability depends on node:sqlite in the runtime.
    • Mitigation: fail fast with an explicit actionable error if the builtin module is unavailable.
  • Risk: audit thresholds could be noisy for some long-running work.
    • Mitigation: audit is read-only and derived; it does not change lifecycle state or cleanup semantics.

@vincentkoc vincentkoc self-assigned this Mar 30, 2026
@openclaw-barnacle openclaw-barnacle bot added cli CLI command changes commands Command implementations size: XL maintainer Maintainer-authored PR labels Mar 30, 2026
@vincentkoc vincentkoc marked this pull request as ready for review March 30, 2026 00:32
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR migrates the shared task ledger from full-snapshot JSON persistence to incremental SQLite storage, and adds a new openclaw tasks audit CLI subcommand for surfacing stale, lost, delivery-failed, and timestamp-inconsistent task runs. The design is solid: WAL mode with a busy timeout handles multi-process access, individual row upsert/delete helpers bypass the expensive full-snapshot path, the node:sqlite builtin is wrapped to surface actionable errors on unsupported runtimes, and test-env path isolation is preserved correctly.

Two design points are worth a look before merging:

  • Double findings for lost tasks (src/tasks/task-registry.audit.ts): a task with status === "lost" triggers both a lost finding (error) and a missing_cleanup finding (warn), because the missing-cleanup check does not exclude the lost status. The second finding is redundant — operators should address the root cause first — and may inflate audit output in environments with many stale lost tasks.
  • Audit summary reflects the filtered set (src/commands/tasks.ts): summarizeTaskAuditFindings is called on the post-filter findings rather than the full universe, so --severity error will report "N findings · N errors · 0 warnings" even if there are outstanding warnings. The printed filter lines partially mitigate this, but computing the summary from the unfiltered set would give a more accurate overall health picture.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/UX concerns that do not affect data integrity or correctness.

No P0 or P1 bugs found. The SQLite store is correctly implemented with WAL mode, proper index coverage, autocommit for individual row ops, and a transactional full-snapshot fallback. The two P2 items (redundant missing_cleanup for lost tasks, summary counted on filtered set) are operator UX concerns that do not break any functional path.

src/tasks/task-registry.audit.ts (missing_cleanup double-flag for lost tasks) and src/commands/tasks.ts (audit summary scoped to filtered set).

Important Files Changed

Filename Overview
src/tasks/task-registry.store.sqlite.ts New SQLite store using node:sqlite with WAL mode, individual row upsert/delete helpers, a full-snapshot fallback wrapped in BEGIN IMMEDIATE, and useful indices. Singleton pattern handled correctly with lazy re-open after close.
src/tasks/task-registry.audit.ts New read-only audit module. The missing_cleanup check fires for lost tasks as well, creating a redundant second finding. Sorting and summarization logic are otherwise sound.
src/commands/tasks.ts Adds tasksAuditCommand with severity/code/limit filtering. Summary counts are computed from the post-filter set rather than the full audit universe, which can be misleading when filters are active.
src/tasks/task-registry.store.ts Store interface extended with optional upsertTask/deleteTask/close hooks. Default store swapped to SQLite. resetTaskRegistryRuntimeForTests now correctly closes the SQLite handle before resetting.
src/tasks/task-registry.ts persistTaskRegistry() calls replaced with granular persistTaskUpsert/persistTaskDelete helpers using incremental SQLite ops when available, falling back to full snapshot for custom stores. Logic is correct.
src/tasks/task-registry.paths.ts Path helpers extracted from the deleted JSON store. Test-env isolation via VITEST/NODE_ENV check is preserved correctly.
src/infra/node-sqlite.ts Thin wrapper that dynamically requires node:sqlite and surfaces a clear actionable error if the builtin module is missing.
src/tasks/task-registry.store.json.ts Deleted as part of the JSON-to-SQLite migration. Expected removal.

Comments Outside Diff (1)

  1. src/tasks/task-registry.audit.ts, line 643-681 (link)

    P2 lost tasks generate redundant missing_cleanup finding

    A task with status === "lost" will always produce two separate findings in the same audit pass: one lost finding (error) and one missing_cleanup finding (warn), because "lost" is neither "queued" nor "running" and a lost task is unlikely to have cleanupAfter set.

    The missing_cleanup warning adds noise without actionable signal in this case — the operator should address the lost root cause first. Consider excluding lost (and similarly abandoned / expired) statuses from the missing_cleanup check, or at least documenting the intentional double-flagging.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/tasks/task-registry.audit.ts
    Line: 643-681
    
    Comment:
    **`lost` tasks generate redundant `missing_cleanup` finding**
    
    A task with `status === "lost"` will always produce two separate findings in the same audit pass: one `lost` finding (error) and one `missing_cleanup` finding (warn), because `"lost"` is neither `"queued"` nor `"running"` and a lost task is unlikely to have `cleanupAfter` set.
    
    The `missing_cleanup` warning adds noise without actionable signal in this case — the operator should address the `lost` root cause first. Consider excluding `lost` (and similarly `abandoned` / `expired`) statuses from the `missing_cleanup` check, or at least documenting the intentional double-flagging.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/tasks/task-registry.audit.ts
Line: 643-681

Comment:
**`lost` tasks generate redundant `missing_cleanup` finding**

A task with `status === "lost"` will always produce two separate findings in the same audit pass: one `lost` finding (error) and one `missing_cleanup` finding (warn), because `"lost"` is neither `"queued"` nor `"running"` and a lost task is unlikely to have `cleanupAfter` set.

The `missing_cleanup` warning adds noise without actionable signal in this case — the operator should address the `lost` root cause first. Consider excluding `lost` (and similarly `abandoned` / `expired`) statuses from the `missing_cleanup` check, or at least documenting the intentional double-flagging.

```suggestion
    if (
      task.status !== "queued" &&
      task.status !== "running" &&
      task.status !== "lost" &&
      typeof task.cleanupAfter !== "number"
    ) {
      findings.push(
        createFinding({
          severity: "warn",
          code: "missing_cleanup",
          task,
          ageMs,
          detail: "terminal task is missing cleanupAfter",
        }),
      );
    }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/commands/tasks.ts
Line: 329

Comment:
**Audit summary reflects filtered set, not the total universe**

`summarizeTaskAuditFindings` is called on the already-filtered `findings` list. This means the printed header (`Task audit: N findings · M errors · P warnings`) counts only what matched the `--severity`/`--code` filters, not the full audit. An operator using `--severity error` to triage will see "2 findings · 2 errors · 0 warnings" even if there are 10 warnings in the registry, which can give a misleading picture of overall health.

Consider computing the summary from the full unfiltered findings:

```suggestion
  const allFindings = listTaskAuditFindings();
  const findings = allFindings.filter((finding) => {
    if (severityFilter && finding.severity !== severityFilter) {
      return false;
    }
    if (codeFilter && finding.code !== codeFilter) {
      return false;
    }
    return true;
  });
  const limit = typeof opts.limit === "number" && opts.limit > 0 ? opts.limit : undefined;
  const displayed = limit ? findings.slice(0, limit) : findings;
  const summary = summarizeTaskAuditFindings(allFindings);
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge branch 'main' into feat/task-sqlit..." | Re-trigger Greptile

const limit = typeof opts.limit === "number" && opts.limit > 0 ? opts.limit : undefined;
const displayed = limit ? findings.slice(0, limit) : findings;
const summary = summarizeTaskAuditFindings(findings);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Audit summary reflects filtered set, not the total universe

summarizeTaskAuditFindings is called on the already-filtered findings list. This means the printed header (Task audit: N findings · M errors · P warnings) counts only what matched the --severity/--code filters, not the full audit. An operator using --severity error to triage will see "2 findings · 2 errors · 0 warnings" even if there are 10 warnings in the registry, which can give a misleading picture of overall health.

Consider computing the summary from the full unfiltered findings:

Suggested change
const allFindings = listTaskAuditFindings();
const findings = allFindings.filter((finding) => {
if (severityFilter && finding.severity !== severityFilter) {
return false;
}
if (codeFilter && finding.code !== codeFilter) {
return false;
}
return true;
});
const limit = typeof opts.limit === "number" && opts.limit > 0 ? opts.limit : undefined;
const displayed = limit ? findings.slice(0, limit) : findings;
const summary = summarizeTaskAuditFindings(allFindings);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/tasks.ts
Line: 329

Comment:
**Audit summary reflects filtered set, not the total universe**

`summarizeTaskAuditFindings` is called on the already-filtered `findings` list. This means the printed header (`Task audit: N findings · M errors · P warnings`) counts only what matched the `--severity`/`--code` filters, not the full audit. An operator using `--severity error` to triage will see "2 findings · 2 errors · 0 warnings" even if there are 10 warnings in the registry, which can give a misleading picture of overall health.

Consider computing the summary from the full unfiltered findings:

```suggestion
  const allFindings = listTaskAuditFindings();
  const findings = allFindings.filter((finding) => {
    if (severityFilter && finding.severity !== severityFilter) {
      return false;
    }
    if (codeFilter && finding.code !== codeFilter) {
      return false;
    }
    return true;
  });
  const limit = typeof opts.limit === "number" && opts.limit > 0 ? opts.limit : undefined;
  const displayed = limit ? findings.slice(0, limit) : findings;
  const summary = summarizeTaskAuditFindings(allFindings);
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2f55dfc35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +281 to +283
mkdirSync(resolveTaskRegistryDir(process.env), { recursive: true });
const { DatabaseSync } = requireNodeSqlite();
const db = new DatabaseSync(pathname);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Harden SQLite task store file and directory modes

The SQLite task store now creates tasks/ with mkdirSync(..., { recursive: true }) and opens runs.sqlite without enforcing permissions, so access falls back to the process umask (often world-readable defaults like 0755/0644). The previous JSON persistence path explicitly wrote 0700 directories and 0600 files, so this is a regression: on shared machines or permissive OPENCLAW_STATE_DIR paths, other local users can read task metadata (including session identifiers and task details). Please set explicit restrictive modes for both the directory and database file after creation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0fb3862520

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +175 to +178
if (
task.status !== "queued" &&
task.status !== "running" &&
typeof task.cleanupAfter !== "number"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude synthetic lost tasks from missing_cleanup checks

listTaskAuditFindings() defaults to reconcileInspectableTasks(), which can return projected lost tasks before maintenance has persisted cleanupAfter (see reconcileTaskRecordForOperatorInspection in src/tasks/task-registry.maintenance.ts). This condition treats every non-queued/non-running task without cleanupAfter as missing_cleanup, so projected lost tasks are double-reported as both lost and missing_cleanup, creating noisy false positives in tasks audit output.

Useful? React with 👍 / 👎.

@vincentkoc vincentkoc merged commit e57b361 into main Mar 30, 2026
18 of 24 checks passed
@vincentkoc vincentkoc deleted the feat/task-sqlite-audit branch March 30, 2026 02:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca6d601fe3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +40 to +41
loadSnapshot: loadTaskRegistrySnapshotFromSqlite,
saveSnapshot: saveTaskRegistrySnapshotToSqlite,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep a fallback when selecting the default task store

Making the default registry backend SQLite-only here means every task-registry code path now hard-depends on node:sqlite; in Bun-based runs this module is not available (No such built-in module: node:sqlite), so task-related commands and background-task paths fail instead of degrading gracefully as they did with JSON persistence. This is a functional regression for supported Bun workflows, so the default store selection should include a runtime-capability fallback.

Useful? React with 👍 / 👎.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot bot commented Mar 30, 2026

🤖 We're reviewing this PR with Aisle

We're running a security check on the changes in this PR now. This usually takes a few minutes. ⌛
We'll post the results here as soon as they're ready.

Progress:

  • Analysis
  • Finalization

Latest run failed. Keeping previous successful results. Trace ID: 019d3c288c735f9b1a8b757f33bad919.

Last updated on: 2026-03-30T09:47:27Z

pritchie pushed a commit to pritchie/openclaw that referenced this pull request Mar 30, 2026
…7361)

* feat(tasks): move task ledger to sqlite

* feat(tasks): add task run audit command

* style(tasks): normalize audit command formatting

* fix(tasks): address audit summary and sqlite perms

* fix(tasks): avoid duplicate lost audit findings
alexjiang1 pushed a commit to alexjiang1/openclaw that referenced this pull request Mar 31, 2026
…7361)

* feat(tasks): move task ledger to sqlite

* feat(tasks): add task run audit command

* style(tasks): normalize audit command formatting

* fix(tasks): address audit summary and sqlite perms

* fix(tasks): avoid duplicate lost audit findings
pgondhi987 pushed a commit to pgondhi987/openclaw that referenced this pull request Mar 31, 2026
…7361)

* feat(tasks): move task ledger to sqlite

* feat(tasks): add task run audit command

* style(tasks): normalize audit command formatting

* fix(tasks): address audit summary and sqlite perms

* fix(tasks): avoid duplicate lost audit findings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli CLI command changes commands Command implementations maintainer Maintainer-authored PR size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant