feat: add temporal lcm_recent rollups (1/10) by 100yenadmin · Pull Request #516 · Martian-Engineering/lossless-claw

100yenadmin · 2026-04-28T13:58:44Z

PR #516: Temporal spine: `lcm_recent` rollups and bounded temporal recall

Maintainer Quick Read

Field	Value
Stack position	1/10
Logical parent for review	main
Current head	`db62164626e680d35b5ef4cadc89d45ea8b4a3f9`
Current PR state	Ready for review/merge after normal maintainer sign-off
Merge note	Base of the production path.
Last body refresh	2026-05-02

Why Maintainers Should Care

This is the time-native memory foundation. It lets an agent ask normal questions like "what happened yesterday?" or "what happened yesterday 4-8pm?" and get a bounded recap with coverage/provenance, while staying honest that the recap is not proof.

Stack Map

flowchart LR
  P516["#516 Temporal spine"] --> P517["#517 Observed work read model"]
  P517 --> P518["#518 Suggestion ledger"]
  P518 --> P537["#537 Deterministic extraction"]
  P537 --> P538["#538 Event observations"]
  P518 --> P539["#539 Opt-in suggestion tools"]
  P538 --> P540["#540 Maintenance wiring"]
  P540 --> P530["#530 Tracker/open state"]
  P538 --> P531["#531 Event episodes"]
  P530 --> P532["#532 Topic/vocabulary"]
  P526["#526 Draft reference"] -. "not the merge path" .-> P516

How It Works

flowchart TD
  A["Agent asks a time-bounded question"] --> B["Parse period in configured timezone"]
  B --> C{"Fresh stored rollup exists?"}
  C -- "yes: day/week/month" --> D["Read rollup summary and coverage"]
  C -- "no: missing, stale, current day, sub-day" --> E["Bounded fallback over leaf summaries/messages"]
  D --> F["Return recap, coverage, degraded/missing flags, optional sources"]
  E --> F
  F --> G["Use describe/expand for exact proof"]

What Ships In This PR

lcm_recent for day/week/month and sub-day window entry points
daily, weekly, and monthly rollup storage plus source/provenance tables
bounded source fallback for missing, stale, current-day, or precise sub-day windows
operator/debug visibility through lcm_rollup_debug with source redaction by default
strict date/time behavior including invalid date rejection, DST-gap rejection, and UTC+13 local date preservation

What This PR Intentionally Does Not Do

does not prove exact commands, SHAs, file paths, timestamps, root cause, or causal chains
does not write Cortex memory or OpenClaw task state
does not build or rebuild rollups from the lcm_recent read path

How An Agent Uses It

"What happened yesterday?" -> lcm_recent({ period: "yesterday" }) reads a fresh day rollup or returns an explicit degraded/missing result with bounded fallback.
"What happened yesterday 4-8pm?" -> precise windows use bounded source fallback, not a whole-day rollup pretending to have sub-day precision.
"What exactly shipped?" -> use lcm_recent only to find candidate windows/sources, then verify with describe/expand/GitHub.

Engineering Boundaries

LCM is an evidence layer. It does not become Cortex, GBrain, OpenClaw Tasks, or a fuzzy semantic authority.
Recaps and observed states are entry points for investigation, not proof of exact commands, SHAs, paths, timestamps, root cause, or task completion.
Exact claims still require source drilldown through lcm_describe, lcm_expand, lcm_expand_query, raw messages, GitHub, logs, tickets, or the relevant authority.
Source identifiers stay hidden by default unless a tool has an explicit includeSources or debug affordance.
Reads stay read-only. Writes happen through explicit storage, maintenance, extraction, or review-ledger paths for the slice that owns them.

Primary Review Files

src/tools/lcm-recent-tool.ts
src/rollup-builder.ts
src/store/rollup-store.ts
src/tools/lcm-rollup-debug-tool.ts
test/rollup-store-builder.test.ts
test/lcm-tools.test.ts

Size And Review Surface

These are intentionally large LCM changes, so the body separates the GitHub-visible diff from the intended logical review slice. Later PRs target main, which makes the GitHub Files tab include parent-stack code; reviewers should use the logical parent listed above when judging the slice.

View	Compare base	Files	Additions	Deletions
GitHub-visible PR page	`main`	23	7785	44
Intended reviewer slice	`main`	45	7907	2140

Logical slice breakdown:

Category	Files	Additions	Deletions
runtime code	16	3936	927
tests	10	2758	1153
docs/release notes	17	1205	52
metadata/package	2	8	8

Review Concerns Already Folded Into The Code

Fallback message timestamp scans keep mixed-format julianday() correctness and now use matching expression indexes for indexed range/order plans.
Read-path mutation is guarded: lcm_recent reads and reports missing/stale state rather than rebuilding.
Stale/pending rollups are bypassed so the tool does not serve a stale day as fresh.
Rollup rebuild state uses source summary fingerprints/watermarks so content edits and summary changes invalidate correctly.
Debug/source surfaces redact source IDs by default.

Validation Evidence

npm test -- --run test/rollup-store-builder.test.ts test/lcm-tools.test.ts test/plugin-prompt-hook.test.ts
npm test
npm run build
git diff --check
GitHub test check: success

Suggested Maintainer Review Path

Start with the Why Maintainers Should Care and How It Works sections.
Use Primary Review Files for the logical slice; the GitHub Files tab can look much larger because the public PRs target main while the series is logically stacked.
Check the non-goals before asking whether LCM should create tasks, close tasks, write Cortex memory, or make authoritative project claims.
Review and merge in the order shown in the stack diagram. feat: implement LCM architecture follow-up (draft) #526 stays draft/reference unless maintainers explicitly choose a single large integration branch.

Copilot

Pull request overview

This PR adds a temporal rollup layer to lossless-claw and exposes it via a new lcm_recent tool (plus an opt-in lcm_rollup_debug tool), enabling deterministic day/week/month recaps and bounded sub-day window recall with provenance.

Changes:

Introduces SQLite schema + store + builder for day/week/month rollups with provenance (lcm_rollups, lcm_rollup_sources, lcm_rollup_state).
Adds lcm_recent for deterministic period/window parsing and bounded leaf-summary SQL fallback when rollups are not appropriate/available.
Wires rollup rebuilds into ingest/maintenance, adds operator-only lcm_rollup_debug, updates docs/changesets, and adds comprehensive test coverage.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/db/migration.ts`	Adds rollup tables/indexes/views and rollup-state columns.
`src/store/rollup-store.ts`	Implements rollup/state CRUD + provenance source replacement and leaf-summary lookups.
`src/rollup-builder.ts`	Builds daily rollups from leaf summaries and aggregates into week/month rollups.
`src/tools/lcm-recent-tool.ts`	Implements `lcm_recent` period/window parsing, rollup selection, and bounded SQL fallback rendering/budgeting.
`src/tools/lcm-rollup-debug-tool.ts`	Implements `lcm_rollup_debug` inspection with source redaction unless `includeSources=true`.
`src/engine.ts`	Instantiates rollup services; marks conversations dirty on ingest; triggers rebuilds after maintenance.
`src/plugin/index.ts`	Registers `lcm_recent` by default and `lcm_rollup_debug` only when enabled.
`src/db/config.ts`	Adds `rollupDebugEnabled` config + env var resolver.
`openclaw.plugin.json`	Documents/configures `rollupDebugEnabled` in plugin settings schema.
`docs/configuration.md`	Documents `LCM_ROLLUP_DEBUG_ENABLED` / `rollupDebugEnabled`.
`README.md`	Adds `LCM_ROLLUP_DEBUG_ENABLED` to env var table.
`skills/lossless-claw/references/recall-tools.md`	Updates recall-tool guidance with availability gating + “recap vs proof” guardrails.
`skills/lossless-claw/SKILL.md`	Updates skill overview and guidance to use `lcm_recent` only when available.
`specs/lcm-temporal-memory-plan.md`	Adds/records consolidated implementation plan and tool contract.
`test/rollup-store-builder.test.ts`	Adds extensive coverage for schema, DST windows, fallback behavior, aggregation, and debug redaction/caps.
`.changeset/lcm-temporal-rollups.md`	Declares minor release for temporal rollups/tools.
`.changeset/lcm-temporal-skill-docs.md`	Declares patch release for skill/doc guidance updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

100yenadmin · 2026-04-28T20:09:29Z

title: "LCM upstream PR stack scenario flows"
type: handoff-test
created: 2026-04-29
status: active
tags: [lossless-claw, lcm, lcm_recent, observed-work, task-bridge, scenarios, agent-flows]
related:

docs/audits/HANDOFF-lcm-upstream-pr-stack-2026-04-29.md
docs/audits/R-515-lcm-rollup-pr-stack-gap-report-2026-04-28.md
docs/architecture/lcm-observed-work-density-option-b-2026-04-28.md
docs/architecture/lcm-task-bridge-option-c-experimental-2026-04-28.md
feat: add temporal lcm_recent rollups (1/10) #516
feat: add LCM observed work density (2/10) #517
feat: add experimental LCM task bridge suggestions (3/10) #518

LCM upstream PR stack scenario flows

This document pressure-tests the planned LCM stack against realistic agent workflows.

The goal is not just “does the schema exist?” The goal is: when a user asks a natural operational question, which tool does the agent call, what layer owns the answer, when does it escalate, and what must never happen silently?

Layer model being tested

LCM temporal spine — PR feat: add temporal lcm_recent rollups (1/10) #516
- Tool: lcm_recent
- Optional explicit/admin writer: lcm_rollup_build or equivalent maintenance path
- Purpose: answer bounded temporal recall questions cheaply and conservatively.
Observed work density — PR feat: add LCM observed work density (2/10) #517
- Tool: lcm_work_density
- Purpose: summarize work signals from conversation evidence.
- Authority: advisory only; not tasks.
Task bridge suggestions — PR feat: add experimental LCM task bridge suggestions (3/10) #518
- Possible future tool: lcm_task_suggestions
- Purpose: store/review suggestions derived from observed work.
- Authority: inert suggestion ledger only; no automatic task writes.
Deep recall / proof layer — existing LCM tools
- Tools: lcm_grep, lcm_describe, lcm_expand_query
- Purpose: when rollups/density are missing, ambiguous, or need exact evidence.
External authority tools
- GitHub skill / gh: PR/issue truth.
- OpenClaw task system / TaskFlow: operational commitments.
- Cortex: authoritative memory/identity-like semantic memory.
- These remain separate from LCM observations.

Scenario 1 — “What did we get done yesterday?”

User asks

What did we get done yesterday?

Ideal agent flow

Call lcm_recent({ period: "yesterday" }).
- Purpose: get the temporal narrative.
- Expected from feat: add temporal lcm_recent rollups (1/10) #516: daily rollup if built; otherwise explicit missing/degraded result.
Call lcm_work_density({ period: "yesterday", detailLevel: 1 }).
- Purpose: get observed completed/unfinished/ambiguous counts and top items.
- Expected from feat: add LCM observed work density (2/10) #517: advisory density report.
If lcm_work_density flags ambiguous items, call lcm_expand_query for the top 1–3 uncertain claims.
- Purpose: proof before making strong claims.

Expected answer shape

Observed work yesterday:
- Completed/likely completed: 11 items
- Still unfinished: 5
- Ambiguous: 2

Highlights:
1. Fleet rollout completed on six healthy VMs.
2. Image generation default moved to openai/gpt-image-2 fleet-wide.
3. David was isolated as a transport/runtime outlier and ticketed.

Still open:
1. David image POST transport issue — ticketed, not solved.
2. LCM upstream PR stack cleanup/consolidation.

Confidence: medium-high.
Note: This is observed from LCM summaries, not authoritative task state.

Ownership boundary

LCM may say “appears completed” or “observed completed.”
LCM must not mark OpenClaw tasks done.
If the user asks for authoritative task status, use OpenClaw task/TaskFlow/GitHub state, not LCM alone.

Implementation implications

#516 must make yesterday’s rollup cheap and stable. #517 must expose density without pretending to be a task system.

Scenario 2 — “Where are we on the upstream LCM PRs?”

User asks

Where are we on the upstream LCM PRs?

Ideal agent flow

Call GitHub via gh pr view/list for canonical truth:
Call lcm_recent({ period: "today", topic: "LCM upstream PRs" }) if topic filtering exists; otherwise lcm_grep("LCM upstream PRs").
- Purpose: recover recent planning/context.
Call lcm_work_density({ period: "today", topic: "LCM upstream PRs", statuses: ["observed_unfinished", "observed_ambiguous"] }).
- Purpose: identify remaining observed work.
If user wants handoff-quality detail, call lcm_expand_query against the handoff/audit summaries or read the artifact files.

Expected answer shape

GitHub truth:
- #516 temporal lcm_recent rollups: open/draft, backbone.
- #517 observed work density: open/draft, depends on #516.
- #518 task bridge suggestions: open/draft, depends on #517.

Observed plan:
1. Finish #516 daily/weekly/monthly correctness first.
2. Keep #517 advisory/read-only.
3. Keep #518 inert/suggestion-only.

Local cleanup remains: close/supersede fork PRs once upstream links are verified.

Ownership boundary

GitHub owns PR state.
LCM owns “what we discussed/planned.”
Observed work density may say “cleanup appears unfinished,” but GitHub confirms actual open/closed status.

Implementation implications

Observed work should include source/period/topic filters, but should not replace live GitHub checks.

Scenario 3 — “What’s blocked?”

User asks

What’s blocked right now?

Ideal agent flow

Call lcm_work_density({ period: "7d", statuses: ["observed_unfinished", "observed_ambiguous"], kinds: ["blocker", "risk", "follow_up"], detailLevel: 2 }).
If a blocker is high-impact, call lcm_expand_query for evidence:
- “What evidence says this is still blocked? Was it later resolved?”
If the blocker maps to GitHub PR/issue, call GitHub for live state.

Expected answer shape

Observed blockers:
1. David image generation transport issue — unresolved, ticketed in Intercom.
   - LCM evidence: fleet rollout smoke failed only on David after provider routing was fixed.
   - Authority: Intercom ticket exists; not an LCM task.

2. LCM #516 merge readiness — still needs daily/weekly/monthly correctness validation.
   - Authority: GitHub PR state and tests.

3. Local fork PR cleanup — not yet done.
   - Authority: GitHub fork PR state.

Ownership boundary

LCM can surface “observed blocker.”
It must not create a task or reminder unless user explicitly asks.
If the user asks to track it, bridge/task system owns that commitment.

Implementation implications

This is the strongest justification for #517. Users ask “what’s blocked?” more often than “show me a rollup row.”

Scenario 4 — “What happened yesterday between 4 and 8pm?”

User asks

What happened yesterday from 4 to 8pm?

Ideal agent flow

Call lcm_recent({ period: "yesterday 4-8pm" }) once feat: add temporal lcm_recent rollups (1/10) #516 supports sub-day windows.
If unsupported or rollup missing, fallback explicitly:
- lcm_grep or bounded summary query with since/before timestamps.
- Convert local time to UTC safely.
If exact causal chain is needed, call lcm_expand_query on candidate summaries.

Expected answer shape

For yesterday 16:00–20:00 local time:
- We investigated Golden/David image-generation config differences.
- We confirmed Golden needed Codex OAuth auth state, not just config.
- Golden smoke passed after auth profile copy.
- David remained an outlier later.

Coverage: bounded to summaries/messages in that local-time window.
Confidence: medium; exact command output available via expansion.

Ownership boundary

If the local time is invalid due to DST, reject with a clear error.
Do not “round” nonexistent local times.
Do not answer from a whole-day rollup as if it proves a sub-day window.

Implementation implications

Sub-day windows are Patch 4, not a day-one dependency. They require careful timezone/DST tests.

Scenario 5 — “Make me a handoff for the next agent”

User asks

Make me a handoff for the agent taking over this work.

Ideal agent flow

Call lcm_recent({ period: "today", topic: "LCM PR stack" }) for compact current context.
Call lcm_work_density({ period: "today", topic: "LCM PR stack", detailLevel: 2 }) to identify completed/open/ambiguous work.
Call live GitHub checks for PR/issue truth.
Read existing artifacts from disk if known:
- handoff doc
- audit doc
- architecture docs
Produce a handoff artifact, not just chat prose.

Expected artifact sections

Canonical upstream PRs and order.
What is already done.
What is still open.
What must not be changed.
Patch sequence.
Validation checklist.
Local cleanup plan.
Copy-paste agent prompt.

Ownership boundary

LCM provides continuity.
GitHub provides upstream truth.
Files/docs become the handoff artifact.

Implementation implications

This validates that LCM recall plus observed work can produce operationally useful handoffs while still verifying mutable external state live.

Scenario 6 — “Should we create tasks for these unfinished things?”

User asks

Should we make tasks from the unfinished items?

Ideal agent flow

Call lcm_work_density({ period: "7d", statuses: ["observed_unfinished"], includeSources: true, minConfidence: 0.8 }).
If Option C is enabled, call future lcm_task_suggestions({ dryRun: true, suggestionKinds: ["create_task"], minConfidence: 0.85 }).
Present suggestions for review.
Only after explicit user approval, use the real task system to create tasks.

Expected answer shape

I found three good task candidates from observed evidence:

1. Close/supersede local fork PRs #12–#18.
   - Confidence: high.
   - Evidence: upstream #516/#517/#518 now canonical.

2. Finish #516 daily/weekly/monthly rollup correctness.
   - Confidence: high.
   - Evidence: R-515 gap report.

3. Revisit David image-generation transport issue.
   - Confidence: medium-high.
   - Evidence: Intercom ticket #96638636.

Want me to create these as OpenClaw tasks?

Ownership boundary

LCM suggests.
User approves.
Task system creates.
LCM never silently writes tasks.

Implementation implications

This is #518’s entire point. It must prove restraint.

Scenario 7 — Missing rollup / cold start

User asks

What happened this week?

Bad behavior to avoid

Silently fall back to raw grep and present it as a full week rollup.
Build a rollup during the read call.
Return a confident recap when source coverage is partial.

Ideal agent flow

Call lcm_recent({ period: "week" }).
If response says week rollup missing:
- Tell the user coverage is degraded.
- Optionally call bounded lcm_grep/lcm_expand_query for a best-effort answer.
- Optionally suggest explicit rollup build if the user/operator wants it.
If explicit build is available and user/admin context allows:
- call lcm_rollup_build({ period: "week" }) or schedule maintenance.
- then retry lcm_recent.

Expected answer shape

I don’t have a built week rollup yet, so this is a best-effort answer from bounded summaries, not a complete weekly rollup.

Best-effort themes:
- Fleet rollout completed.
- Image-gen default rollout mostly completed; David ticketed.
- LCM upstream handoff and scenario docs created.

If you want a stronger answer, I can run/build the week rollup and retry.

Ownership boundary

Reads do not mutate. Build is explicit.

Implementation implications

#516 needs structured missing/degraded responses and a separate build path.

Scenario 8 — “Did we already file this upstream?”

User asks

Did we already file this upstream, or is it only local?

Ideal agent flow

Search GitHub upstream issues/PRs live.
Search local/fork PRs live.
Call lcm_grep for recent discussion if needed.
Use lcm_work_density only to identify observed cleanup items, not truth.

Expected answer shape

Upstream truth:
- The active PRs are #516/#517/#518 in Martian-Engineering/lossless-claw.
- The 100yenadmin PRs are local mirrors/historical branches.

Local cleanup needed:
- Close/supersede #16/#17/#18 with upstream links.
- Close/supersede #12–#15 after verifying all review notes are captured.
- Defer/cherry-pick #4–#11 ideas as noted.

Ownership boundary

GitHub owns “filed upstream.” LCM only remembers the plan.

Implementation implications

This scenario proves agents must combine LCM with live external checks.

Scenario 9 — “What should I read first?”

User/agent asks

I’m taking over. What should I read first?

Ideal agent flow

Read handoff doc.
Read R-515 gap report.
Read Option B/C docs.
Read upstream PR descriptions/diffs.
Run tests/build only after understanding the target.

Expected answer shape

Read in this order:
1. HANDOFF-lcm-upstream-pr-stack-2026-04-29.md
2. R-515-lcm-rollup-pr-stack-gap-report-2026-04-28.md
3. lcm-observed-work-density-option-b-2026-04-28.md
4. lcm-task-bridge-option-c-experimental-2026-04-28.md
5. Upstream PRs #516/#517/#518 and their diffs

Ownership boundary

Docs guide work; PR diffs and tests verify implementation.

Scenario 10 — “How does this avoid becoming a second memory/task brain?”

User asks

How do these layers not collapse into a mess?

Expected answer

The stack separates three questions:

1. When did something happen?
   - LCM temporal spine: lcm_recent.

2. What does conversation evidence suggest is done/unfinished?
   - Observed work density: lcm_work_density.
   - Advisory only.

3. Should this become an operational task?
   - OpenClaw task system / TaskFlow, optionally assisted by inert suggestions.
   - Requires explicit acceptance.

LCM never becomes the authority for tasks. It provides evidence and summaries.

Implementation implications

The words matter. Tool outputs and docs must consistently say “observed,” “appears,” “evidence,” and “confidence,” not “task is done.”

Scenario matrix

Scenario	First tool	Escalation	External truth	Must not do
What happened yesterday?	`lcm_recent`	`lcm_work_density`, `lcm_expand_query`	none unless PR/task claims	claim authority from observation
What got done?	`lcm_work_density`	`lcm_expand_query`	GitHub/tasks if named	mark tasks done
What is blocked?	`lcm_work_density`	`lcm_expand_query`	GitHub/Intercom/tasks	create tasks silently
Where are PRs?	GitHub skill / `gh`	`lcm_recent`, `lcm_grep`	GitHub	trust stale rollup over live PR
Sub-day window	`lcm_recent`	bounded grep/expand	none	fake precision from day rollup
Missing rollup	`lcm_recent`	grep/expand, explicit build	none	mutate on read
Create task candidates	`lcm_work_density`	`lcm_task_suggestions` dry-run	task system	direct write from LCM
Local cleanup	GitHub skill / `gh`	LCM handoff docs	GitHub	delete branches before salvage

Resulting design pressure on PRs

#516 must provide

read-only lcm_recent
explicit degraded/missing rollup response
explicit/admin rollup build path or maintenance path
daily/weekly/monthly real builders
provenance and coverage accounting
timezone/DST-safe windows before sub-day support ships

#517 must provide

observed-work storage and read tool
status vocabulary that avoids task authority
confidence/rationale/provenance
compact density response
no read mutation
no sync to Cortex or tasks

#518 must provide

suggestion storage only
review states: pending/accepted/rejected/dismissed/expired
source IDs and confidence
tests proving no OpenClaw task writes
no registered write tool by default

Final takeaway

The stack works if each layer answers one narrow question:

LCM recent: “What happened in this time window?”
Observed work density: “What does the evidence suggest is done/unfinished?”
Task bridge suggestions: “Would you like to turn this evidence into an explicit task action?”

If any layer starts silently mutating the next layer, the architecture fails.

100yenadmin · 2026-04-30T04:07:30Z

Final adversarial pass - merge order 1/10

Score: mergeability 9.8/10, functionality 9.4/10, description 9.7/10.

Functionality audit: Works as the temporal spine. lcm_recent covers today/yesterday/date/week/month/7d/30d plus sub-day bounded windows; missing/stale/degraded coverage is explicit; debug source IDs stay hidden unless requested; exact proof still routes to source expansion. The final hardening fixed stale daily rollups being promoted into ready aggregate recaps and added raw-message timestamp indexes for bounded fallback.

Scenario coverage: Supports temporal recap and “yesterday 4-8pm” style entry flows. It intentionally does not prove exact commands, SHAs, root cause, shipped status, or task status.

Final state: MERGEABLE, GitHub test pass, GraphQL review threads: 0 unresolved. Local focused suite, full npm test, npm run build, and git diff --check passed on head 076e476.

Residual caveat: lcm_recent is a recap/window-entry tool, not proof. Merge this first.

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

100yenadmin · 2026-05-01T08:52:55Z

Final May 1 hardening/status update:

Merge order: 1/10 — Temporal spine
Head: f51042f
Review state: 0 unresolved GraphQL review threads after the latest push
GitHub state: MERGEABLE / CLEAN, test check green
Validation: 47 focused tests; build; diff-check
Score: Functionality 10/10, mergeability 10/10, description 10/10
Note: Ready first. Latest hardening prevents pending/stale rollups from hiding fresh fallback evidence and keeps rebuild pending when leaf summaries change during a sweep.

The PR description top note has been refreshed with current head SHA, visible-vs-logical LOC breakdown, scope, caveats, and validation status. LCM remains an evidence layer: recaps, observed work, events, episodes, and suggestions are entry points, not proof or task authority.

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ve ints Round 1's P2-2 clamped the env path via \`parseFinitePositiveInt\`, but the plugin-config path still resolved through plain \`toNumber\` and silently accepted 0 / negative — manifest's \`minimum:1\` invariant held only when the env var was the source of truth. Add \`toPositiveNumber\` and route the three rollup token caps through it so both config sources enforce the same floor. Closes R2-FIX-5 from audit/pr516/round2-A-fix-correctness.md (P2-2 caveat).

R3-FIX-1: both queries `ORDER BY julianday(...) DESC LIMIT 1001` lacked a PK tiebreaker. Deterministic within a single SQLite instance via rowid stability, but breaks across `.dump | sqlite3 newdb` reimport — two rows with identical timestamps flip ordering after the round-trip. Adds `, summary_id ASC` to the summaries fallback query and `, m.message_id ASC` to the messages fallback query. Mirrors the c192ee8 fix for listConversationsBySessionKey (P1-9). Exposes getRecentSummaryFallback via __lcmRecentTestInternals so the regression test can drive it directly. Adds a regression test that inserts two summaries with identical latest_at/created_at in reverse alphabetical insertion order and asserts the ASC tiebreak puts the lexicographically smaller summary_id first across repeated invocations.

…TE tx R3-FIX-2: P1-8 closed the FK-consistency hole in buildAggregateRollup and the non-empty buildDayRollup path, but the empty-day cleanup branches still ran `getRollup` + `deleteRollup` outside any transaction. A concurrent rebuild that upserts a fresh rollup at the same (conversation, kind, key) slot had its work destroyed by a stale-empty-snapshot delete from a racing caller. Self-healing via pending_rebuild=1, but transient data loss until next maintain. Wraps both empty-day cleanup sites — one in buildDailyRollups' main loop, one in buildDayRollup itself — in `withDatabaseTransaction(this.store.db, "BEGIN IMMEDIATE", …)` and reads getRollup INSIDE the transaction so the decision to delete is made against the row that's actually there at lock time. Adds a structural regression test (mirrors P1-8's source-text scan) asserting that in both empty-day branches `getRollup(` and `deleteRollup(` appear AFTER `withDatabaseTransaction(`. Per Round-1 spec, deterministic concurrency repros are hard; the structural assertion is the accepted approach.

R3-FIX-3: the original assertion `details.tokenCount <= 500` would also pass on the pre-R2-FIX-2 code because the outer enforceResponseBudget at lcm-recent-tool.ts:2133 unconditionally clamps responses to budget. The fix is correct; the test was tautological (R3-H ⚠ from audit/pr516/SUMMARY_R3.md). Strengthens the test in three ways: 1. Bulks the fixture to six prior-day digests + four live-fallback leaves so combinedContent (the inner index assembly) genuinely overflows the 500-token budget — the post-R2-FIX-2 re-truncation path at lcm-recent-tool.ts:1944 actually fires, rather than the test silently passing because the synthetic content fit. 2. Adds structural assertions that the rollup-index header is well-formed, the period count in the header matches the number of `#### ` entries (R2-FIX-3 invariant cross-check), and the live-fallback section header is present — so a future refactor that drops the live-fallback merge or breaks the count-matching would now fail this test. 3. Asserts `details.truncated` is true, which is the externally observable signal callers act on; under-reporting was R2-H's concern about the pre-fix path. A truly black-box discriminator (response text differs pre- vs post-fix) is not achievable here because the outer enforceResponseBudget clamps both pipelines down to the same length whenever combinedContent overflows. The test now defends the structural integrity of the combined-output pipeline rather than a tautological budget check. No production change.

P0-2: Wire a synchronous, replay-clean resolver that walks runtimeConfig.models.providers[*].models[*] for {id, contextWindow} and returns a closure mapped to the registration-time snapshot. Optionally seeds gaps from the pi-ai catalog (mod.getModels) at lower precedence. Exposes via createLcmDependencies as getModelContextWindow so lcm-recent's auto-detail-level can drop its live-RPC tiers (Tier 2/3) in a follow-up commit. Replay-determinism analysis: result is a pure function over the snapshot; rebuild-per-process matches replay-machine guarantees; no wall-clock or live RPC reads. Finding ID: P0-2 (round2-C-design.md Item 1)

P0-2: Delete the unreachable Tier 2 (status RPC) and Tier 3 (models.list RPC) ladders from computeAutoDetailLevel. The investigation confirmed both have never returned non-null: - Tier 2's pickFirstNumber walked response paths (sessions.byAgent[].entries[].usage.{promptTokens,...}) that don't match openclaw's actual StatusSummary shape (which uses recent[].totalTokens / recent[].contextTokens keyed by key). - Tier 3 is unreachable because plugin-sdk doesn't expose gateway-RPC invocation; callGateway throws on models.list. Removed: runtimeStatusCache, modelContextWindowCache, modelCatalogFetchPromise, runtimeStatusFetchPromise, RUNTIME_STATUS_CACHE_TTL_MS, fetchRuntimeStatus, ensureModelCatalogLoaded, pickFirstNumber. computeAutoDetailLevel loses its callGateway and sessionKey parameters and is now synchronous. Updated the single call site in the tool body. Net diff: -215 / +21. Output is a pure function of (telemetry-row, registry-snapshot) — replay-clean by construction. See audit/pr516/INVESTIGATION-design-calls.md Question 1 for the full reachability analysis. Finding ID: P0-2 (round2-C-design.md Item 1)

P1-3: The post-build state-update block at the end of buildDailyRollups read latestState / latestSummaryCreatedAt / latestSummaryFingerprint OUTSIDE any transaction, decided shouldClearPending, then wrote pending_rebuild=0 — a concurrent ingest writer that flipped pending_rebuild=1 between our read and our write was silently clobbered, leaving rollup state stale until the next ingest re-armed. Fix: wrap the read-decide-write in withDatabaseTransaction(this.store.db, "BEGIN IMMEDIATE", ...) and re-read all three fields INSIDE the txn body. With BEGIN IMMEDIATE, SQLite blocks any concurrent writer until our COMMIT; our reads observe the freshest committed value. Tests: structural pin (P1-8 / R3-FIX-2 pattern) asserting all three reads + the upsertState write appear AFTER withDatabaseTransaction("BEGIN IMMEDIATE", ...), plus a runtime race test that hooks getState to simulate a concurrent last_message_at advance and asserts the builder re-arms pending_rebuild=1 instead of clobbering to 0. Finding ID: P1-3 (round2-C-design.md Item 3)

Adds `Clock { now(): Date }` interface to `LcmDependencies` as a REQUIRED field. Plugin's `createLcmDependencies` populates it with a live-clock impl (`{ now: () => new Date() }`). Tests inject a frozen clock for deterministic windows. This is the contract that read-path wall-clock reads will switch to in follow-up commits — replacing direct `new Date()` / `Date.now()` calls in `engine.ts` and `lcm-recent-tool.ts`. Object-with-method shape matches the `subagent` convention in `PluginRuntime`, future-extensible to `monotonic()` without breaking callers. Updated `createTestDeps` in 4 test files (engine, bootstrap-flood, session-operation-queues, lcm-command) so the typed return matches the new required field. `*.test.ts` factories that cast `as unknown as LcmDependencies` are unaffected. Finding ID: P0-3 (B1, B2)

Replaces three direct \`new Date()\` reads in \`lcm-recent-tool.ts\` with \`deps.clock.now()\`: - \`resolveWindowPeriod\` (line ~424): \`last Nh\` / \`last Nm\` window end - \`resolvePeriod\` (line ~482): \`today\` anchor day-key resolution - \`assembleAcrossConversations\` (line ~1466): \`currentDayKey\` \`resolvePeriod\` and \`resolveWindowPeriod\` gain a required \`clock: Clock\` parameter, threaded from the tool's main \`execute\` via \`input.deps.clock\`. The new test \"anchors 'today' off the injected clock — B3 regression\" pins the contract: with a frozen clock, \`resolvePeriod(\"today\", ...)\` produces the expected day-key independent of the wall clock. \`new Date(value)\` calls that wrap a recorded value (\`row.built_at\`, \`rollup.period_start\`, etc.) are deterministic by construction and not changed. Existing \`__lcmRecentTestInternals.resolvePeriod\` callers updated to pass a live-clock third arg. \`makeRecentDeps()\` factory in \`rollup-store-builder.test.ts\` populated \`clock\` so the tool can find it at runtime. Finding ID: P0-3 (B3)

…sBack Removes the \`now = new Date()\` default arg on \`computeRollupMaintenanceDaysBack\` and threads the clock through the single caller in \`maintain()\` via \`this.deps.clock.now()\`. This is the last read-path wall-clock site in \`engine.ts\` related to rollup maintenance, bundling R3-D-1. Adds the regression test \"maintain() uses deps.clock.now() to compute rollup-maintenance daysBack — B4 regression\" that injects a frozen clock at engine construction (not via vi.setSystemTime) and asserts the computed daysBack uses the injected time, not the wall clock. With last_rollup_check_at at frozenAt - 2d 2h, daysBack must be 4 (ceil 2.083 + 1). Finding ID: P0-3 (B4) / R3-D-1

Removes \`Intl.DateTimeFormat().resolvedOptions().timeZone\` from the \`engine.timezone\` getter. The fallback caused read-path output to vary by host machine, defeating reproducibility — a workstation in Pacific time and a server in UTC produced different day-keys for the same recorded conversation. Behavior now: - If \`config.timezone\` is non-empty, return it as-is. - Otherwise return 'UTC' and emit a one-time \`log.warn\` at engine construction telling the operator to set the config. Subsequent getter calls do NOT re-warn (debounced by emitting only in the constructor). Adds two regression tests (B5): - empty config.timezone returns 'UTC' and warns exactly once across repeated getter calls - non-empty config.timezone returns it without warning Finding ID: P0-4 (B5)

Adds \`RollupStore.getTimezone(conversationId): string | null\` that reads the \`lcm_rollup_state.timezone\` column for one conversation (returning null if no row exists). \`upsertState\` already wrote this column; this is just the read path. \`lcm_recent\` now resolves timezone as \`store.getTimezone(conversationId) ?? engine.timezone\`, with the rollup state row winning over the engine-wide default. This means a conversation recorded in \`America/Los_Angeles\` keeps its day-keys / sub-day windows even if the operator later changes the plugin-level config or runs the engine on a host with a different default. Brand-new conversations without state rows still fall back to the engine default. Adds two B6 regression tests in \`rollup-store-builder.test.ts\`: - \`RollupStore.getTimezone\` returns null pre-state and the persisted tz post-\`upsertState\` - \`lcm_recent\` reads the persisted LA tz over engine UTC default — proven by a sub-day window that only intersects the seed under LA tz Finding ID: P0-4 (B6)

The strict-UTC getter and construction-time warning gated on `config.timezone.length === 0`, so a whitespace-only string (" ", a tab, etc.) bypassed BOTH the warning and the UTC fallback. The getter returned the whitespace verbatim and RollupBuilder threaded it into Intl.DateTimeFormat, which throws RangeError on the first rollup-maintenance pass. Switch both gates to `!config.timezone?.trim()` so empty, whitespace, null and undefined all fall through to UTC and emit the one-time warning. Finding: RD-FIX-1 (P2) from audit/pr516/round-design-B-new-bugs.md.

…olver The optional pi-ai catalog seeding via mod.getModels was wired into the resolver signature but the production call site always passed undefined — pi-ai catalog seeding cannot run synchronously inside createLcmDependencies. The mod-only branch is dead code that adds function-surface noise. Remove the parameter from the signature, the seeding branch from the body, and the undefined arg from the call site. Drop the two mod-specific tests (they were testing dead behavior). Add a regression test pinning last-write-wins precedence across cross-provider model-id collisions and document it in the JSDoc. Finding: RD-FIX-2 (P3) from audit/pr516/round-design-B-new-bugs.md. Also addresses RD-FIX (P3-2) precedence-undocumented gap.

Batch A added \`export\` to computeAutoDetailLevel so the test file could import it directly. That widened the public API surface for what is genuinely internal logic — a downstream consumer importing it would lock in the signature and turn future changes into breaking ones. Drop the export keyword and add the function to the existing __lcmRecentTestInternals namespace alongside the other test hooks (resolvePeriod, extractRollupDigest, ...). Update the test to import via that namespace. Behavior is unchanged. Finding: RD-FIX-3 (P3) from audit/pr516/round-design-B-new-bugs.md.

…y drift execute() observed wall-clock at two independent sites: resolvePeriod (via deps.clock.now() → today day-key) and the currentDayKey gate at the rollup-state-vs-current-day check. With a live clock, two reads a few ms apart could land on different sides of local midnight, producing an inconsistent view of "today" within a single tool call. Capture deps.clock.now() exactly once at execute() entry as callTime and thread the captured Date through both sites. To keep resolvePeriod a pure function of its inputs, its signature changes from (period, timezone, clock: Clock) to (period, timezone, now: Date) — resolveWindowPeriod likewise. Tests in test/rollup-store-builder.test.ts update accordingly. Add a regression test pinning the contract: a stepping clock that crosses local midnight between calls is consumed exactly ONCE per lcm_recent invocation. Finding: RD-FIX-4 (P3) from audit/pr516/round-design-B-new-bugs.md.

…ent rejection + 5 user scenarios Per reviewer/operator feedback: the prior PR description was an architecture dump that didn't explain why v4.1 is positively better than the rollup approach in Martian-Engineering#516, didn't walk through user scenarios, and didn't reference the lcm_recent rejection history. This rewrite: - Leads with "Why we threw out lcm_recent" explaining the three failure modes we hit: repetition, compression-of-compression, and the inability to ask sideways (topic-not-time) questions. - Walks through 5 concrete user scenarios with before/after comparison: yesterday's work / paraphrastic rebase question / operator hard-forget / entity tracking ("all the work I've done with Voyage") / opt-in themes. - Adds a cost discipline table (per-tier model dispatch is the lever). - Adds "What v4.1 is NOT" (intentional non-goals: not RAG, not auto-rollups, not auto-tied to themes). - Operator setup walkthrough with expected log lines. - Architecture diagrams collapsed into <details> for reviewers who want the technical depth but skippable for first read. - Final.review (ec99fd0) findings documented in adversarial review history. Live-DB harness output (still PASSED) preserved as the smoking gun.

Copilot AI review requested due to automatic review settings April 28, 2026 13:58

Copilot started reviewing on behalf of 100yenadmin April 28, 2026 13:59 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread src/tools/lcm-recent-tool.ts Outdated

Comment thread src/engine.ts Outdated

Comment thread src/rollup-builder.ts Outdated

This was referenced Apr 28, 2026

feat: add LCM observed work density (2/10) #517

Closed

feat: add experimental LCM task bridge suggestions (3/10) #518

Closed

jalehman self-assigned this Apr 28, 2026

100yenadmin requested a review from Copilot May 1, 2026 06:58

Copilot started reviewing on behalf of 100yenadmin May 1, 2026 06:59 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread src/tools/lcm-recent-tool.ts Outdated

Comment thread specs/lcm-temporal-memory-plan.md

100yenadmin requested a review from Copilot May 1, 2026 11:23

Copilot started reviewing on behalf of 100yenadmin May 1, 2026 11:24 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread src/tools/lcm-recent-tool.ts

100yenadmin requested a review from Copilot May 2, 2026 19:17

Eva added 16 commits May 4, 2026 02:04

100yenadmin force-pushed the feat/lcm-temporal-memory-consolidated branch from 1552e50 to 5902c47 Compare May 3, 2026 19:07

100yenadmin closed this May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add temporal lcm_recent rollups (1/10)#516

feat: add temporal lcm_recent rollups (1/10)#516
100yenadmin wants to merge 55 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/lcm-temporal-memory-consolidated

100yenadmin commented Apr 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

100yenadmin commented Apr 28, 2026

Uh oh!

100yenadmin commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

100yenadmin commented May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

100yenadmin commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #516: Temporal spine: lcm_recent rollups and bounded temporal recall

Maintainer Quick Read

Why Maintainers Should Care

Stack Map

How It Works

What Ships In This PR

What This PR Intentionally Does Not Do

How An Agent Uses It

Engineering Boundaries

Primary Review Files

Size And Review Surface

Review Concerns Already Folded Into The Code

Validation Evidence

Suggested Maintainer Review Path

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

100yenadmin commented Apr 28, 2026

LCM upstream PR stack scenario flows

Layer model being tested

Scenario 1 — “What did we get done yesterday?”

User asks

Ideal agent flow

Expected answer shape

Ownership boundary

Implementation implications

Scenario 2 — “Where are we on the upstream LCM PRs?”

User asks

Ideal agent flow

Expected answer shape

Ownership boundary

Implementation implications

Scenario 3 — “What’s blocked?”

User asks

Ideal agent flow

Expected answer shape

Ownership boundary

Implementation implications

Scenario 4 — “What happened yesterday between 4 and 8pm?”

User asks

Ideal agent flow

Expected answer shape

Ownership boundary

Implementation implications

Scenario 5 — “Make me a handoff for the next agent”

User asks

Ideal agent flow

Expected artifact sections

Ownership boundary

Implementation implications

Scenario 6 — “Should we create tasks for these unfinished things?”

User asks

Ideal agent flow

Expected answer shape

Ownership boundary

Implementation implications

Scenario 7 — Missing rollup / cold start

User asks

Bad behavior to avoid

Ideal agent flow

Expected answer shape

Ownership boundary

Implementation implications

Scenario 8 — “Did we already file this upstream?”

User asks

Ideal agent flow

Expected answer shape

Ownership boundary

Implementation implications

Scenario 9 — “What should I read first?”

User/agent asks

100yenadmin commented Apr 28, 2026 •

edited

Loading

PR #516: Temporal spine: `lcm_recent` rollups and bounded temporal recall