feat: add temporal lcm_recent rollups (1/10)#516
feat: add temporal lcm_recent rollups (1/10)#516100yenadmin wants to merge 55 commits intoMartian-Engineering:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a temporal rollup layer to lossless-claw and exposes it via a new lcm_recent tool (plus an opt-in lcm_rollup_debug tool), enabling deterministic day/week/month recaps and bounded sub-day window recall with provenance.
Changes:
- Introduces SQLite schema + store + builder for day/week/month rollups with provenance (
lcm_rollups,lcm_rollup_sources,lcm_rollup_state). - Adds
lcm_recentfor deterministic period/window parsing and bounded leaf-summary SQL fallback when rollups are not appropriate/available. - Wires rollup rebuilds into ingest/maintenance, adds operator-only
lcm_rollup_debug, updates docs/changesets, and adds comprehensive test coverage.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/db/migration.ts |
Adds rollup tables/indexes/views and rollup-state columns. |
src/store/rollup-store.ts |
Implements rollup/state CRUD + provenance source replacement and leaf-summary lookups. |
src/rollup-builder.ts |
Builds daily rollups from leaf summaries and aggregates into week/month rollups. |
src/tools/lcm-recent-tool.ts |
Implements lcm_recent period/window parsing, rollup selection, and bounded SQL fallback rendering/budgeting. |
src/tools/lcm-rollup-debug-tool.ts |
Implements lcm_rollup_debug inspection with source redaction unless includeSources=true. |
src/engine.ts |
Instantiates rollup services; marks conversations dirty on ingest; triggers rebuilds after maintenance. |
src/plugin/index.ts |
Registers lcm_recent by default and lcm_rollup_debug only when enabled. |
src/db/config.ts |
Adds rollupDebugEnabled config + env var resolver. |
openclaw.plugin.json |
Documents/configures rollupDebugEnabled in plugin settings schema. |
docs/configuration.md |
Documents LCM_ROLLUP_DEBUG_ENABLED / rollupDebugEnabled. |
README.md |
Adds LCM_ROLLUP_DEBUG_ENABLED to env var table. |
skills/lossless-claw/references/recall-tools.md |
Updates recall-tool guidance with availability gating + “recap vs proof” guardrails. |
skills/lossless-claw/SKILL.md |
Updates skill overview and guidance to use lcm_recent only when available. |
specs/lcm-temporal-memory-plan.md |
Adds/records consolidated implementation plan and tool contract. |
test/rollup-store-builder.test.ts |
Adds extensive coverage for schema, DST windows, fallback behavior, aggregation, and debug redaction/caps. |
.changeset/lcm-temporal-rollups.md |
Declares minor release for temporal rollups/tools. |
.changeset/lcm-temporal-skill-docs.md |
Declares patch release for skill/doc guidance updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
title: "LCM upstream PR stack scenario flows"
LCM upstream PR stack scenario flowsThis document pressure-tests the planned LCM stack against realistic agent workflows. The goal is not just “does the schema exist?” The goal is: when a user asks a natural operational question, which tool does the agent call, what layer owns the answer, when does it escalate, and what must never happen silently? Layer model being tested
Scenario 1 — “What did we get done yesterday?”User asks
Ideal agent flow
Expected answer shapeOwnership boundary
Implementation implications#516 must make yesterday’s rollup cheap and stable. #517 must expose density without pretending to be a task system. Scenario 2 — “Where are we on the upstream LCM PRs?”User asks
Ideal agent flow
Expected answer shapeOwnership boundary
Implementation implicationsObserved work should include source/period/topic filters, but should not replace live GitHub checks. Scenario 3 — “What’s blocked?”User asks
Ideal agent flow
Expected answer shapeOwnership boundary
Implementation implicationsThis is the strongest justification for #517. Users ask “what’s blocked?” more often than “show me a rollup row.” Scenario 4 — “What happened yesterday between 4 and 8pm?”User asks
Ideal agent flow
Expected answer shapeOwnership boundary
Implementation implicationsSub-day windows are Patch 4, not a day-one dependency. They require careful timezone/DST tests. Scenario 5 — “Make me a handoff for the next agent”User asks
Ideal agent flow
Expected artifact sections
Ownership boundary
Implementation implicationsThis validates that LCM recall plus observed work can produce operationally useful handoffs while still verifying mutable external state live. Scenario 6 — “Should we create tasks for these unfinished things?”User asks
Ideal agent flow
Expected answer shapeOwnership boundary
Implementation implicationsThis is #518’s entire point. It must prove restraint. Scenario 7 — Missing rollup / cold startUser asks
Bad behavior to avoid
Ideal agent flow
Expected answer shapeOwnership boundaryReads do not mutate. Build is explicit. Implementation implications#516 needs structured missing/degraded responses and a separate build path. Scenario 8 — “Did we already file this upstream?”User asks
Ideal agent flow
Expected answer shapeOwnership boundaryGitHub owns “filed upstream.” LCM only remembers the plan. Implementation implicationsThis scenario proves agents must combine LCM with live external checks. Scenario 9 — “What should I read first?”User/agent asks
Ideal agent flow
Expected answer shapeOwnership boundaryDocs guide work; PR diffs and tests verify implementation. Scenario 10 — “How does this avoid becoming a second memory/task brain?”User asks
Expected answerImplementation implicationsThe words matter. Tool outputs and docs must consistently say “observed,” “appears,” “evidence,” and “confidence,” not “task is done.” Scenario matrix
Resulting design pressure on PRs#516 must provide
#517 must provide
#518 must provide
Final takeawayThe stack works if each layer answers one narrow question:
If any layer starts silently mutating the next layer, the architecture fails. |
Final adversarial pass - merge order 1/10Score: mergeability 9.8/10, functionality 9.4/10, description 9.7/10. Functionality audit: Works as the temporal spine. Scenario coverage: Supports temporal recap and “yesterday 4-8pm” style entry flows. It intentionally does not prove exact commands, SHAs, root cause, shipped status, or task status. Final state: Residual caveat: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Final May 1 hardening/status update:
The PR description top note has been refreshed with current head SHA, visible-vs-logical LOC breakdown, scope, caveats, and validation status. LCM remains an evidence layer: recaps, observed work, events, episodes, and suggestions are entry points, not proof or task authority. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ve ints Round 1's P2-2 clamped the env path via \`parseFinitePositiveInt\`, but the plugin-config path still resolved through plain \`toNumber\` and silently accepted 0 / negative — manifest's \`minimum:1\` invariant held only when the env var was the source of truth. Add \`toPositiveNumber\` and route the three rollup token caps through it so both config sources enforce the same floor. Closes R2-FIX-5 from audit/pr516/round2-A-fix-correctness.md (P2-2 caveat).
R3-FIX-1: both queries `ORDER BY julianday(...) DESC LIMIT 1001` lacked a PK tiebreaker. Deterministic within a single SQLite instance via rowid stability, but breaks across `.dump | sqlite3 newdb` reimport — two rows with identical timestamps flip ordering after the round-trip. Adds `, summary_id ASC` to the summaries fallback query and `, m.message_id ASC` to the messages fallback query. Mirrors the c192ee8 fix for listConversationsBySessionKey (P1-9). Exposes getRecentSummaryFallback via __lcmRecentTestInternals so the regression test can drive it directly. Adds a regression test that inserts two summaries with identical latest_at/created_at in reverse alphabetical insertion order and asserts the ASC tiebreak puts the lexicographically smaller summary_id first across repeated invocations.
…TE tx R3-FIX-2: P1-8 closed the FK-consistency hole in buildAggregateRollup and the non-empty buildDayRollup path, but the empty-day cleanup branches still ran `getRollup` + `deleteRollup` outside any transaction. A concurrent rebuild that upserts a fresh rollup at the same (conversation, kind, key) slot had its work destroyed by a stale-empty-snapshot delete from a racing caller. Self-healing via pending_rebuild=1, but transient data loss until next maintain. Wraps both empty-day cleanup sites — one in buildDailyRollups' main loop, one in buildDayRollup itself — in `withDatabaseTransaction(this.store.db, "BEGIN IMMEDIATE", …)` and reads getRollup INSIDE the transaction so the decision to delete is made against the row that's actually there at lock time. Adds a structural regression test (mirrors P1-8's source-text scan) asserting that in both empty-day branches `getRollup(` and `deleteRollup(` appear AFTER `withDatabaseTransaction(`. Per Round-1 spec, deterministic concurrency repros are hard; the structural assertion is the accepted approach.
R3-FIX-3: the original assertion `details.tokenCount <= 500` would also pass on the pre-R2-FIX-2 code because the outer enforceResponseBudget at lcm-recent-tool.ts:2133 unconditionally clamps responses to budget. The fix is correct; the test was tautological (R3-H ⚠ from audit/pr516/SUMMARY_R3.md). Strengthens the test in three ways: 1. Bulks the fixture to six prior-day digests + four live-fallback leaves so combinedContent (the inner index assembly) genuinely overflows the 500-token budget — the post-R2-FIX-2 re-truncation path at lcm-recent-tool.ts:1944 actually fires, rather than the test silently passing because the synthetic content fit. 2. Adds structural assertions that the rollup-index header is well-formed, the period count in the header matches the number of `#### ` entries (R2-FIX-3 invariant cross-check), and the live-fallback section header is present — so a future refactor that drops the live-fallback merge or breaks the count-matching would now fail this test. 3. Asserts `details.truncated` is true, which is the externally observable signal callers act on; under-reporting was R2-H's concern about the pre-fix path. A truly black-box discriminator (response text differs pre- vs post-fix) is not achievable here because the outer enforceResponseBudget clamps both pipelines down to the same length whenever combinedContent overflows. The test now defends the structural integrity of the combined-output pipeline rather than a tautological budget check. No production change.
P0-2: Wire a synchronous, replay-clean resolver that walks
runtimeConfig.models.providers[*].models[*] for {id, contextWindow}
and returns a closure mapped to the registration-time snapshot.
Optionally seeds gaps from the pi-ai catalog (mod.getModels) at
lower precedence. Exposes via createLcmDependencies as
getModelContextWindow so lcm-recent's auto-detail-level can drop
its live-RPC tiers (Tier 2/3) in a follow-up commit.
Replay-determinism analysis: result is a pure function over the
snapshot; rebuild-per-process matches replay-machine guarantees;
no wall-clock or live RPC reads.
Finding ID: P0-2 (round2-C-design.md Item 1)
P0-2: Delete the unreachable Tier 2 (status RPC) and Tier 3
(models.list RPC) ladders from computeAutoDetailLevel. The
investigation confirmed both have never returned non-null:
- Tier 2's pickFirstNumber walked response paths
(sessions.byAgent[].entries[].usage.{promptTokens,...}) that
don't match openclaw's actual StatusSummary shape (which uses
recent[].totalTokens / recent[].contextTokens keyed by key).
- Tier 3 is unreachable because plugin-sdk doesn't expose
gateway-RPC invocation; callGateway throws on models.list.
Removed: runtimeStatusCache, modelContextWindowCache,
modelCatalogFetchPromise, runtimeStatusFetchPromise,
RUNTIME_STATUS_CACHE_TTL_MS, fetchRuntimeStatus,
ensureModelCatalogLoaded, pickFirstNumber. computeAutoDetailLevel
loses its callGateway and sessionKey parameters and is now
synchronous. Updated the single call site in the tool body.
Net diff: -215 / +21. Output is a pure function of (telemetry-row,
registry-snapshot) — replay-clean by construction. See
audit/pr516/INVESTIGATION-design-calls.md Question 1 for the full
reachability analysis.
Finding ID: P0-2 (round2-C-design.md Item 1)
P1-3: The post-build state-update block at the end of
buildDailyRollups read latestState / latestSummaryCreatedAt /
latestSummaryFingerprint OUTSIDE any transaction, decided
shouldClearPending, then wrote pending_rebuild=0 — a concurrent
ingest writer that flipped pending_rebuild=1 between our read and
our write was silently clobbered, leaving rollup state stale until
the next ingest re-armed.
Fix: wrap the read-decide-write in
withDatabaseTransaction(this.store.db, "BEGIN IMMEDIATE", ...) and
re-read all three fields INSIDE the txn body. With BEGIN IMMEDIATE,
SQLite blocks any concurrent writer until our COMMIT; our reads
observe the freshest committed value.
Tests: structural pin (P1-8 / R3-FIX-2 pattern) asserting all
three reads + the upsertState write appear AFTER
withDatabaseTransaction("BEGIN IMMEDIATE", ...), plus a runtime
race test that hooks getState to simulate a concurrent
last_message_at advance and asserts the builder re-arms
pending_rebuild=1 instead of clobbering to 0.
Finding ID: P1-3 (round2-C-design.md Item 3)
Adds `Clock { now(): Date }` interface to `LcmDependencies` as a REQUIRED
field. Plugin's `createLcmDependencies` populates it with a live-clock
impl (`{ now: () => new Date() }`). Tests inject a frozen clock for
deterministic windows.
This is the contract that read-path wall-clock reads will switch to in
follow-up commits — replacing direct `new Date()` / `Date.now()` calls
in `engine.ts` and `lcm-recent-tool.ts`. Object-with-method shape
matches the `subagent` convention in `PluginRuntime`, future-extensible
to `monotonic()` without breaking callers.
Updated `createTestDeps` in 4 test files (engine, bootstrap-flood,
session-operation-queues, lcm-command) so the typed return matches the
new required field. `*.test.ts` factories that cast `as unknown as
LcmDependencies` are unaffected.
Finding ID: P0-3 (B1, B2)
Replaces three direct \`new Date()\` reads in \`lcm-recent-tool.ts\` with \`deps.clock.now()\`: - \`resolveWindowPeriod\` (line ~424): \`last Nh\` / \`last Nm\` window end - \`resolvePeriod\` (line ~482): \`today\` anchor day-key resolution - \`assembleAcrossConversations\` (line ~1466): \`currentDayKey\` \`resolvePeriod\` and \`resolveWindowPeriod\` gain a required \`clock: Clock\` parameter, threaded from the tool's main \`execute\` via \`input.deps.clock\`. The new test \"anchors 'today' off the injected clock — B3 regression\" pins the contract: with a frozen clock, \`resolvePeriod(\"today\", ...)\` produces the expected day-key independent of the wall clock. \`new Date(value)\` calls that wrap a recorded value (\`row.built_at\`, \`rollup.period_start\`, etc.) are deterministic by construction and not changed. Existing \`__lcmRecentTestInternals.resolvePeriod\` callers updated to pass a live-clock third arg. \`makeRecentDeps()\` factory in \`rollup-store-builder.test.ts\` populated \`clock\` so the tool can find it at runtime. Finding ID: P0-3 (B3)
…sBack Removes the \`now = new Date()\` default arg on \`computeRollupMaintenanceDaysBack\` and threads the clock through the single caller in \`maintain()\` via \`this.deps.clock.now()\`. This is the last read-path wall-clock site in \`engine.ts\` related to rollup maintenance, bundling R3-D-1. Adds the regression test \"maintain() uses deps.clock.now() to compute rollup-maintenance daysBack — B4 regression\" that injects a frozen clock at engine construction (not via vi.setSystemTime) and asserts the computed daysBack uses the injected time, not the wall clock. With last_rollup_check_at at frozenAt - 2d 2h, daysBack must be 4 (ceil 2.083 + 1). Finding ID: P0-3 (B4) / R3-D-1
Removes \`Intl.DateTimeFormat().resolvedOptions().timeZone\` from the \`engine.timezone\` getter. The fallback caused read-path output to vary by host machine, defeating reproducibility — a workstation in Pacific time and a server in UTC produced different day-keys for the same recorded conversation. Behavior now: - If \`config.timezone\` is non-empty, return it as-is. - Otherwise return 'UTC' and emit a one-time \`log.warn\` at engine construction telling the operator to set the config. Subsequent getter calls do NOT re-warn (debounced by emitting only in the constructor). Adds two regression tests (B5): - empty config.timezone returns 'UTC' and warns exactly once across repeated getter calls - non-empty config.timezone returns it without warning Finding ID: P0-4 (B5)
Adds \`RollupStore.getTimezone(conversationId): string | null\` that reads the \`lcm_rollup_state.timezone\` column for one conversation (returning null if no row exists). \`upsertState\` already wrote this column; this is just the read path. \`lcm_recent\` now resolves timezone as \`store.getTimezone(conversationId) ?? engine.timezone\`, with the rollup state row winning over the engine-wide default. This means a conversation recorded in \`America/Los_Angeles\` keeps its day-keys / sub-day windows even if the operator later changes the plugin-level config or runs the engine on a host with a different default. Brand-new conversations without state rows still fall back to the engine default. Adds two B6 regression tests in \`rollup-store-builder.test.ts\`: - \`RollupStore.getTimezone\` returns null pre-state and the persisted tz post-\`upsertState\` - \`lcm_recent\` reads the persisted LA tz over engine UTC default — proven by a sub-day window that only intersects the seed under LA tz Finding ID: P0-4 (B6)
The strict-UTC getter and construction-time warning gated on
`config.timezone.length === 0`, so a whitespace-only string
(" ", a tab, etc.) bypassed BOTH the warning and the UTC
fallback. The getter returned the whitespace verbatim and
RollupBuilder threaded it into Intl.DateTimeFormat, which
throws RangeError on the first rollup-maintenance pass.
Switch both gates to `!config.timezone?.trim()` so empty,
whitespace, null and undefined all fall through to UTC and
emit the one-time warning.
Finding: RD-FIX-1 (P2) from audit/pr516/round-design-B-new-bugs.md.
…olver The optional pi-ai catalog seeding via mod.getModels was wired into the resolver signature but the production call site always passed undefined — pi-ai catalog seeding cannot run synchronously inside createLcmDependencies. The mod-only branch is dead code that adds function-surface noise. Remove the parameter from the signature, the seeding branch from the body, and the undefined arg from the call site. Drop the two mod-specific tests (they were testing dead behavior). Add a regression test pinning last-write-wins precedence across cross-provider model-id collisions and document it in the JSDoc. Finding: RD-FIX-2 (P3) from audit/pr516/round-design-B-new-bugs.md. Also addresses RD-FIX (P3-2) precedence-undocumented gap.
Batch A added \`export\` to computeAutoDetailLevel so the test file could import it directly. That widened the public API surface for what is genuinely internal logic — a downstream consumer importing it would lock in the signature and turn future changes into breaking ones. Drop the export keyword and add the function to the existing __lcmRecentTestInternals namespace alongside the other test hooks (resolvePeriod, extractRollupDigest, ...). Update the test to import via that namespace. Behavior is unchanged. Finding: RD-FIX-3 (P3) from audit/pr516/round-design-B-new-bugs.md.
…y drift execute() observed wall-clock at two independent sites: resolvePeriod (via deps.clock.now() → today day-key) and the currentDayKey gate at the rollup-state-vs-current-day check. With a live clock, two reads a few ms apart could land on different sides of local midnight, producing an inconsistent view of "today" within a single tool call. Capture deps.clock.now() exactly once at execute() entry as callTime and thread the captured Date through both sites. To keep resolvePeriod a pure function of its inputs, its signature changes from (period, timezone, clock: Clock) to (period, timezone, now: Date) — resolveWindowPeriod likewise. Tests in test/rollup-store-builder.test.ts update accordingly. Add a regression test pinning the contract: a stepping clock that crosses local midnight between calls is consumed exactly ONCE per lcm_recent invocation. Finding: RD-FIX-4 (P3) from audit/pr516/round-design-B-new-bugs.md.
1552e50 to
5902c47
Compare
…ent rejection + 5 user scenarios Per reviewer/operator feedback: the prior PR description was an architecture dump that didn't explain why v4.1 is positively better than the rollup approach in Martian-Engineering#516, didn't walk through user scenarios, and didn't reference the lcm_recent rejection history. This rewrite: - Leads with "Why we threw out lcm_recent" explaining the three failure modes we hit: repetition, compression-of-compression, and the inability to ask sideways (topic-not-time) questions. - Walks through 5 concrete user scenarios with before/after comparison: yesterday's work / paraphrastic rebase question / operator hard-forget / entity tracking ("all the work I've done with Voyage") / opt-in themes. - Adds a cost discipline table (per-tier model dispatch is the lever). - Adds "What v4.1 is NOT" (intentional non-goals: not RAG, not auto-rollups, not auto-tied to themes). - Operator setup walkthrough with expected log lines. - Architecture diagrams collapsed into <details> for reviewers who want the technical depth but skippable for first read. - Final.review (ec99fd0) findings documented in adversarial review history. Live-DB harness output (still PASSED) preserved as the smoking gun.
PR #516: Temporal spine:
lcm_recentrollups and bounded temporal recallMaintainer Quick Read
db62164626e680d35b5ef4cadc89d45ea8b4a3f9Why Maintainers Should Care
This is the time-native memory foundation. It lets an agent ask normal questions like "what happened yesterday?" or "what happened yesterday 4-8pm?" and get a bounded recap with coverage/provenance, while staying honest that the recap is not proof.
Stack Map
How It Works
flowchart TD A["Agent asks a time-bounded question"] --> B["Parse period in configured timezone"] B --> C{"Fresh stored rollup exists?"} C -- "yes: day/week/month" --> D["Read rollup summary and coverage"] C -- "no: missing, stale, current day, sub-day" --> E["Bounded fallback over leaf summaries/messages"] D --> F["Return recap, coverage, degraded/missing flags, optional sources"] E --> F F --> G["Use describe/expand for exact proof"]What Ships In This PR
lcm_recentfor day/week/month and sub-day window entry pointslcm_rollup_debugwith source redaction by defaultWhat This PR Intentionally Does Not Do
lcm_recentread pathHow An Agent Uses It
lcm_recent({ period: "yesterday" })reads a fresh day rollup or returns an explicit degraded/missing result with bounded fallback.lcm_recentonly to find candidate windows/sources, then verify with describe/expand/GitHub.Engineering Boundaries
lcm_describe,lcm_expand,lcm_expand_query, raw messages, GitHub, logs, tickets, or the relevant authority.includeSourcesor debug affordance.Primary Review Files
src/tools/lcm-recent-tool.tssrc/rollup-builder.tssrc/store/rollup-store.tssrc/tools/lcm-rollup-debug-tool.tstest/rollup-store-builder.test.tstest/lcm-tools.test.tsSize And Review Surface
These are intentionally large LCM changes, so the body separates the GitHub-visible diff from the intended logical review slice. Later PRs target
main, which makes the GitHub Files tab include parent-stack code; reviewers should use the logical parent listed above when judging the slice.mainmainLogical slice breakdown:
Review Concerns Already Folded Into The Code
julianday()correctness and now use matching expression indexes for indexed range/order plans.lcm_recentreads and reports missing/stale state rather than rebuilding.Validation Evidence
npm test -- --run test/rollup-store-builder.test.ts test/lcm-tools.test.ts test/plugin-prompt-hook.test.tsnpm testnpm run buildgit diff --checktestcheck: successSuggested Maintainer Review Path
mainwhile the series is logically stacked.